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BLEOMYCIN GENE CLUSTER COMPONENTS AND THEIR USES 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims benefit under 35 U.S.C. §1 19 of provisional 
applications USSN 60/1 15,435, filed on January 6, 1999, and USSN 60/1 18,848, filed on 
5 February 5, 1999, both of which are herein incorporated by reference in their entirety for all 

purposes. 

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY 
SPONSORED RESEARCH AND DEVELOPMENT 

This work was supported in part by an Institutional Research Grant from the 
10 American Cancer Society and the School of Medicien, University of California, Davis, 

National Institutes of Health Grant Number A140475, and a grant from the Searle Scholars 
Program of the Chicago Community Trust. The Government of the United States of 
America may have certain rights in this invention. 

FIELD OF THE INVENTION 

15 This invention relates the field of polyketide synthesis and nonribosomal 

polypeptide synthesis. In particular this invention pertains to the isolation of the bleomycin 
gene cluster which encodes the first identified hybrid polyketide synthase/nonribosomal 
peptide synthetase pathway. 

BACKGROUND OF THE INVENTION 

20 Polyketides and nonribosomal peptides are two large families of natural 

products that include many clinically valuable drugs, such as erythromycin and vancomycin 
(antibacterial), FK506 and cyclosporin (immunosuppresant), and epothilone and bleomycin 
(BLM) (antitumor). The biosyntheses of polyketides and nonribosomal peptides are 
catalyzed by polyketide synthases (PKSs) (Hopwood (1997) Chem. Rev. 97: 2465; Katz 

25 (1997) Chem. Rev., 97: 2557; C. Khosla, (1997) Chem. Rev., 97: 2577; Ikeda and Omura, 
(1997) Chem. Rev., 97: 2591; Staunton and Wilkinson(1997) Chem. Rev., 97: 261 1; Cane et 
c/.(1998) Science 282: 63) and nonribosomal peptide synthetases (NRPSs) (Cane et 
a/.(1998) Science 282: 63. Marahiel et al. (1997) Chem. Rev. 97: 2651; von Dohren et al. 
(1997) Chem. Rev. 97: 2675), respectively. Remarkably, PKSs and NRPSs use a very 
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similar strategy for the assembly of these two distinct classes of natural products by 
sequential condensation of short carboxylic acids and amino acids, respectively, and utilize 
the same 4--phosphopantetheine prosthetic group, via a thioester linkage, to channel the 
growing polyketide or peptide intermediate during the elongation processes. 
5 Both type I PKSs and NRPSs are multifunctional proteins that are organized 

into modules. (A module is defined as a set of distinctive domains that encode all the 
enzyme activities necessary for one cycle of polyketide or peptide chain elongation and 
associated modifications.) The number and order of modules and the type of domains within 
a module on each PKS or NRPS protein determine the structural variations of the resulting 
10 polyketide and peptide products by dictating the number, order, choice of the carboxylic acid 
or amino acid to be incorporated, and the modifications associated with a particular cycle of 
elongation. These features of PKS and NRPS inspired us to search for a hybrid PKS and 
NRPS system. Since the modular architecture of both PKS (Cane et al.( 1998) Science 282: 
63; Katz and Danadio (1993) Ann. Rev. Microbiol. 47: 875 (1993); Hutchinson and Fujii 
15 (1995) Ann. Rev. Microbiol. 49: 201) and NRPS (Cane et «/.(1998) Science 282: 63, 

Stachelhaus et al. (1995) Science 269: 69; Stachelhaus et al. (198) Mol. Gen. Genet. 257: 
308; Belshaw et al. (1999) Science 284, 486) has been exploited successfully in 
combinatorial biosynthesis of diverse "unnatural" natural products, it is imagined that a 
hybrid PKS and NRPS system, capable of incorporating both carboxylic acids and ammo 
20 acids into the final products, could surely lead to even greater chemical structural divers.ty. 
The BLMs, differing structurally at the C-terminal amines of the 
glycopeptides, are a family of antibiotics produced by Streptomyces verticillus (Sv), BLMs 
exhibit strong antitumor activity through a metal-dependent oxidative cleavage ofDNA or 
RNA in the presence of molecular oxygen and are incorporated into current chemotherapy of 
25 several malignancies under the trade name of Blenoxane® that contains BLM A2 and BLM 
B2 as the principal constituents (Sikic et al. Eds. (1985) Bleomycin Chemotherapy, 
Academic Press, New York; Natrajan and Hecht (1994) pages 197-242 In: Molecular 
Aspects of Anticancer Drug-DNA Interaction Vol. 2, Neidle and Waring Eds., MacmUlan, 
London). Umezawa, Fujii, Takita, and co-workers extensively studied the biosynthesis of 
30 BLM in Sv ATCC15003 by feeding isotope-labeled precursors and by isolating various 
biosynthetic intermediates and shunt metabolites, establishing that the BLMs are in fact 
natural hybrid metabolites of polyketide and peptide biosynthesis (Takita and Muroka (1990) 
pages 289-309 In: Biochemistry of Peptide Antibiotics: Recent Advances in the 
Biotechnology of ^Lactams and Microbial Peptides, Kleinkauf and Von Dohren Eds., W. de 
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Gruyter New York). On the assumption that BLM biosynthesis follows the paradigm for 
peptide and polyketide biosynthesis, we predict that the Blm megasynthetase, wh.ch 
catalyzes the assembly of the BLM backbone from nine amino acids and one acetate, should 
bear the characteristics of both NRPS and PKS, providing an excellent model to study the 
5 mechanism by which NRPS and PKS could be integrated into a productive biosynthetic 
system to synthesize a hybrid peptide and polyketide metabolite (Fig. 1 A) (Shen et al. (1999) 
Bioorg. Chem.TJ: 155). 

SUMMARY OF THE INVENTION 
This invention pertains to the isolation and elucidation of the bleomycin gene 
10 cluster. Nucleic acid sequences encoding all of the open reading frames (ORFs) that encode 
polypeptides sufficient to direct the biosynthesis of bleomycin are provided. The nucle.c 
acids can be used in their "native" format or recombined in a wide variety of manners to 

create novel synthetic pathways. 

In one embodiment, this invention provides an isolated nucleic acid 

15 comprising a nucleic acid selected from the group consisting of a nucleic acid encoding any 
one of Blm open reading frames (ORFs) 8 through 41, and/or a nucleic acid encoding a 
polypeptide encoded by any one of Blm open reading frames (ORFs) 8 through 41, and/or a 
nucleic acid amplified by polymerase chain reaction (PCR) using any one of the pnmer pan 
identified in Table II and the nucleic acid of a bleomycin-producing organism as a template. 

20 The nucleic acid may comprise one or multiple {e.g. two, more preferably 3 or more) 

bleomycin open reading frames (i.e. BLM ORFs 8 through 41). One preferred nucle.c acd 
comprises a nucleic acid encoding a C domain lacking one or more His residues of the 
conserved HHxxxDG active site for transudation. In another preferred embodiment the 
nucleic acid comprises a nucleic acid encoding a protein encoded by a gene selected from the 

25 group consisting of blml, blmll, and blmXI. 

In another embodiment this invention provides an isolated nucleic acid 
encoding a (biosynthetic) module comprising two or more (more preferably three or more, 
most preferably four or more) catalytic domains of a protein encoded by a nucleic acd of a 
bleomycin gene cluster wherein said catalytic domains are selected from the group consisting 

30 of a condensation (C) domain, an adenylate (A) domain, a peptidyl carrier proton (PCP) 
domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, 
an oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain. Preferred 

3 
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nucleic acids comprises a nucleic acid encoding one or more proteins comprising a module 

rwoDC n NRPS-1 NRPS-2 NRPS-3, NRPS-4, NRPS- 
selected from the group consisting of NRPS-0, NRPb- 1 , iwa z, ni^o , 

5 NRPS-6 NRPS-7,NRPS-7,NRPS-9,andPKS. Particularly preferred nucleic acids 
comprise an open reading frame from SEQ ID NO: 1, SEQ ID NO: 2, or SEQ ID NO: 3. 
5 In still another embodiment, this invention provides an isolated nucleic add 

comprising a nucleic acid encoding a protein encoded by a gene from a BLM gene cluster. 
Preferred nucleic acids encode a protein encoded by a gene selected from the group 
consisting of blml, blmll, and blmXI. In another embodiment, preferred nucleic acids 
encode a protein encoded by a gene selected from the group consisting of blmlll, blmlV, 
10 blmV blmVI,blmVII,blmIX,andblmX. In still yet another embodiment, the nucleic acid 
comprises a nucleic acid encoding a protein encoded by blmVIII. Particularly preferred 
nucleic acids comprise a nucleic acid selected from the group consisting of blml, blmll, and 
blmXI. Other particularly preferred nucleic acids comprise a nucleic acid selected from the 
group consisting of blmlll, blmlV, blmV, blmVI, blmVII, blmlX, and blmX, white still other 
1 5 particularly preferred nucleic acids comprise blmVIII. 

In still yet another embodiment, this invention provides an isolated nucleic 
acid comprising a nucleic acid that encodes a protein comprising at least one catalytic 
domain selected from the group consisting of a condensation (C) domain, an adenylation (A) 
domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an 
20 acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), a ketoacyl synthase 
(KS) domain , an acetyl transferase (AT) domain, a ketoreductase (KR) domain, and a 
methyltransferase (MT) domain, and that hybridizes to a nucleic acid selected from the group 
consisting of orf8, orf9, orflO, orfll, orfl2, orfl3, orfl4, orflS, orfl5, orfl6, orfl7, orfl8, 
orfl9, orf20, orf21, orf22, orf23, orf24, orf25, orf26, orf27, orf28, orf29, orOO, orfll, orf32, 
25 orf33, orf34, orf35, orf36, orf37, orf38, orf39, and orf40 under stringent conditions. In 
certain embodiments this also includes nucleic acids that would stringently hybridizes 
indicated above, but for, the degeneracy of the nucleic acid code. In other words, if silent 
mutations could be made in the subject sequence so that it hybridizes to he indicated 
sequence(s) under stringent conditions, it would be included in certain embodiments. A 
30 preferred isolated nucleic acid comprises a nucleic acid encoding a module. A particularly 
preferred isolated nucleic acid comprises a nucleic acid encoding a BLM gene. 

This invention also provides a nucleic acid comprising a nucleic acid selected 
from the group consisting of consisting of orf8, orf9, orflO, orfl 1, orfl2, orfl3, orfl4, orflS, 
orfl5, orfl 6, orfl7, orflS, orfl9, orf20, orf21, orf22, orf23, orf24, orf25, orf26, orf27, orf28, 
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orf29, orOO, orDl, orB2, orf33, orf34, orOS, orD6, orf37, orf38, orf39, and orf40, or an 
allelic variant thereof. Preferred nucleic acids comprise a nucleic acid that is a single 
nucleotide polymorphism (SNP) of a nucleic acid selected from the group consisting of 
consisting of orf8, orf9, orflO, orfl 1, orf!2, orfl3, orfl4, orflS, orflS, orfl6, orfl7, orflS, 
orfl9, orf20, orDl, orf22, orf23, orf24, orf25, orf26, orf27, orf28, orf29, orDO, orf3 1, orf32, 
orf33, orf34, orf35, orf36, orf37, orf38, orf39, and orf40. 

This invention also provides an isolated gene cluster comprising open reading 
frames encoding polypeptides sufficient to direct the assembly of a bleomycin. 

In one embodiment this invention provides an isolated multi-functional 
protein complex comprising both a polyketide synthase (PKS) and a polypeptide synthetase 
(NRPS) and/or an isolated nucleic acid encoding a multi-functional protein complex 
comprising both a polyketide synthase (PKS) and a polypeptide synthetase (NRPS). 

This invention also provides various blm cluster polypeptides or blm cluster- 
derived polypeptides. Thus, in one embodiment this invention provides an isolated 
polypeptide comprising a catalytic domain encoded by a nucleic acid of a bleomycin gene 
cluster wherein said nucleic acid comprises a nucleic acid selected from the group consisting 
of a nucleic acid encoding any one of Blm open reading frames (ORFs) 8 through 41; and/or 
a nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer 
pairs identified in Table II. Preferred polypeptides comprise an enzymatic domain selected 
from the group consisting of a condensation (C) domain, an adenylation (A) domain, a 
peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl- 
carrier protein (ACP)-like domain, an oxidization domain (Ox), a ketoacyl synthase (KS) 
domain , an acetyl transferase (AT) domain, a ketoreductase (KR) domain, and a 
methyltransferase (MT) domain. Particularly preferred polypeptides are encoded by the 

nucleic acids described above and herein. 

This invention also provides expression vectors comprising any of the nucleic 
acids described herein and/or host cells {e.g. Streptomyces) transfected and/or transformed 
with any of these expression vectors. A preferred host cell is transformed with an exogenous 
nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct the 
assembly of a bleomycin or bleomycin analog. 

This invention also provides methods of use of the blm and W/»-derived 
nucleic acid(s) and/or polypeptides. One such method is a method of chemically modifying 
a biological molecule. The method involves contacting a biological molecule that is a 
substrate for a polypeptide encoded by one or more bleomycin biosynthesis gene cluster 
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open reading frames with the polypeptide encoded by one or more bleomycin biosynthesis 
gene cluster open reading frames, whereby the polypeptide chemically modifies the 
biological molecule. In one particularly preferred embodiment, the biological molecule is an 
amino acid and said polypeptide is a peptide synthetase. In another preferred embodiment, 
the polypeptide is a methyl transferase. Other substrates and blm encoded polypeptides are 

illustrated in Table II. 

In another embodiment this invention provides a method of coupling a first 

amino acid to a second amino acid. This method involves contacting the first and second 

amino acid with a recombinant* expressed bleomycin nonribosomal peptide synthetase 

(NRPS) A preferred NRPS is selected from the group consisting of NRPS-5, NRPS-4, 

NRPS-3 NRPS-9,NRPS-8 >a ndNRPS-7. Another preferred NRPS is selected from the 

group consisting of NRPS-6, NRPS-2, NRPS-1, and NRPS-0. The contacting can be /„ v.vo 

{e.g. in a host cell) or ex vivo. 

In another embodiment this invention provides a methods of coupling a first 
fatty acid to a second fatty acid, said method comprising contacting the first and second fatty 
acids with a recombinant* expressed bleomycin polyketide synthase (PKS). Again, the 
contacting can be in vivo {e.g. in a host cell) or ex vivo. 

In still another embodiment, this invention provides a method of producing a 
bleomycin or bleomycin analog. The method involves providing a cell transformed with an 
exogenous nucleic acid comprising a bleomycin gene cluster encoding polypeptides 
sufficient to direct the assembly of said bleomycin or bleomycin analog; cultunng the cell 
under conditions permitting the biosynthesis of bleomycin or bleomycin analog; and 
isolating said bleomycin or bleomycin analog from said cell. 

This invention also provides an isolated nucleic acid comprising a nucleic 
acid encoding a phosphopantetheinyl transferase said nucleic acid encoding a 
phosphopantetheinyl transferase being selected from the group consisting of: a nucletc acid 
encoding the protein encoded by the nucleic acid of SEQ ID NO:3; a nucleic acid amplified 
by polymerase chain reaction (PCR) using primers that specifically amplify ORF 41 
(primers- SEQ ID NO:71 and SEQ ID NO:72) and Streptomyces nucleic acid as a template; a 
nucleic acid encoding a polypeptide having phosphopantetheinyl transferase activity where 
said nucleic acid specifically hybridizes to the nucleic acid of SEQ ID NO: 3 under stringent 
conditions. In one embodiment, the nucleic acid comprises the nucleic acid of SEQ ID 
NO:3. 
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,„ another embodiment, this invention provides a polypeptide uprising a 
phosphopan,e,»einy, transferase encode, by SEQ .D NO;3 or a potvpeptUU ^avmg 
phosphopan,e,hei»yl transfer activity and the se,uence encoded by the nucletc 
SEO ID NO' 3 or conservative substitutions of that polypept.de. 
, Also provided are vectors comprising a nucleic acid cncodtng a 

phosphopantetheiny, transferase (e.,. as described above) and c=,,s transfected with the 

Ve0,M ' This invention also provides a method of converting an apo carrier protein to 
aholocarrierpro.ein.saidme.hodcomprisingreac.ingsaidapo^merpro.einwtuta 

,„ Imbinan, phosphopantetheiny. transferase encoded b, SEQ ID NO:3 and coenzyme A 

thereby producing a holo-carrier protein. 

.nceminembodintenU.thisinvenUonspeciftcaUyexeludesoneormoreof 

open reading frames 1 through 41. In particularly preferred embodiments, this inventton 
excludes open reading frames 1 through 7 (Orf 1- Orf 7). 

15 IWFINITIONS 

The "polyketide synthases" (PKSs) refers are multifunctional enzymes, 
related to fatty acid synthases (FASs). PKSs cataly. the biosynthesis of P°>*«™ h 
repeated (decarboxylase, Claisen condensations bcween acyUhioes.ers, usually 
propionyl, ma,ony. or methylmalonyl. Fol.owing e,ch condensation, they W«* 

2 0 ILL — variabilis into the product, casing a.,, par,, ornon of a reduc ve 

the growing polyketide chain. PKSs incorporate enormous structural drverstty •« 
products, in addition to varying the condensation cyeie. by controlling the overa, chatn 
,ength choice of primer and extender units and. particularly in the case of arornattc 

hainhasgrowntoaleng.hchauc.eris.ie of each s^rfic product * <yp.cauy «l=a*< 

which work together to produce a given poiykende. Two general class* of PKSs extst. 
class, known as Type I PKSs, is represented by the PKSs for macroUdes such as 
30 ery^omyci. These -complex" or "modular PKSs include assemb.tes of 
m'tifunctioaalproteins earning, bcween Cent, a se, of separate act— «^ep 
carbon chain assembly and modif.ca.ion (Cortes - (1990) M*. 348: .76 « 
„, (199.) Science 252: 675; MacNeiU, at. (1992, 0-115: 1 19). Structural d,vers..y 
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occurs in this class from variations in the number and type of active sites in the PKSs. This 
class of PKSs displays a one-to-one correlation between the number and clustering of active 
sites in the primary sequence of the PKS and the structure of the polykctide backbone. The 
second class of PKSs, called Type II PKSs, is represented by the synthases for aromatic 
5 compounds. Type II PKSs typically have a single set of iteratively used.active sites (Bibb el 
al. (1989) EMBOJ. 8: 2727; Sherman et al. (1989(£MflO./. 8: 2717; Fernandez-Moreno, 
et al. (1992)7. Biol. Chem. 267:19278). 

A "nonribosomal peptide synthase" (NRPS) refers to an enzymatic complex 
of eucaryotic or procaryotic origin, that is responsible for the synthesis of peptides by a 
10 nonribosomal mechanism, often known as thiotemplate synthesis (Kleinkauf and von 

Doehren (1987) Ann. Rev. Microbiol.. 41: 259-289). Such peptides, which can be up to 20 or 
more amino acids in length, can have a linear, cyclic (cyclosporine, tyrocidine, 
mycobacilline, surfactin and others) or branched cyclic structure (polymyxin, bacitracin and 
others) and often contain amino acids not present in proteins or modified amino acids 
1 5 through methylation or epimerization. 

A "module" refers to a set of distinctive polypeptide domains that encode all 
the enzyme activities necessary for one cycle of polyketide or peptide chain elongation and 

associated modifications. 

The terms "isolated" "purified" or "biologically pure" refer to material which 
20 is substantially or essentially free from components which normally accompany it as found 
in its native state. With respect to nucleic acids and/or polypeptides the term can refer to 
nucleic acids or polypeptides that are no longer flanked by the sequences typically flanking 
them in nature. 

The terms "polypeptide", "peptide" and "protein" are used interchangeably 
25 herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers 
in which one or more amino acid residue is an artificial chemical analogue of a 
corresponding naturally occurring amino acid, as well as to naturally occurring amino acid 
polymers. The term also includes variants on the traditional peptide linkage joining the 
amino acids making up the polypeptide. 

The terms "nucleic acid" or "oligonucleotide" or grammatical equivalents 
herein refer to at least two nucleotides covalently linked together. A nucleic acid of the 
present invention is preferably single-stranded or double stranded and will generally contain 
phosphodiester bonds, although in some cases, as outlined below, nucleic acid analogs are 
included that may have alternate backbones, comprising, for example, phosphoramide 

8 
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(Beaucage et al. (1993) Tetrahedron 49(10):1925) and references therein; Letsinger (1970) 
J. Org Chem. 35:3800; Sprinzl et al. (1977) Eur. J. Biochem.ZV. 579; Letsinger et al. (1986) 
Nucl. Acids Res. 14: 3487; Sawai etal. (1984) Chem. Lett. 805, Letsinger et al. (1988) J. Am. 
Chem Soc 1 10: 4470; and Pauwels et al. (1986) Chemica Scripta 26: 1419), 
phosphorothioate (Mag et al. (1991) Nucleic Acids Res. 19:1437; and U.S. Patent No. 
5,644,048), phosphorodithioate (Briu et al. (1989) J. Am. Chem. Soc. 1 1 1 :2321, 0- 
methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A 
Practical Approach, Oxford University Press), and peptide nucleic acid backbones and 
Linkages (^Egholm (1992) J. Am. Chem. Soc. 114:1895; Meier* al. (1992) Chem. Int. Ed. 
Engl. 31: 1008; Nielsen (1993) Nature, 365: 566; Carlsson et al. (1996) Nature 380: 207). 
Other analog nucleic acids include those with positive backbones (Denpcy et al. (1995) 
Proc Natl Acad. Sci. USA 92: 6097; non-ionic backbones (U.S. Patent Nos. 5,386,023, 

5 637 684, 5,602,240, 5,216,141 and 4,469,863; Angew. (1991) Chem. Intl. Ed. English 30: 
423" Letsinger et al. (1988) J. Am. Chem. Soc. 1 10:4470; Letsinger et al. (1994) Nucleoside 

6 Nucleotide 13:1597; Chapters 2 and 3, ASC Symposium Series 580, "Carbohydrate 
Modifications in Antisense Research", Ed. Y.S. Sanghui and P. Dan Cook; Mesmaeker et al. 
(1994), Bioorganic & Medicinal Chem. Lett. 4: 395; Jeffs et al. (1994) J. Biomolecular NMR 
34: 17; Tetrahedron Lett. 37:743 (1996)) and non-ribose backbones, including those 
described in U.S. Patent Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC 
Symposium Series 580, Carbohydrate Modifications in Antisense Research, Ed. Y.S. 
Sanghui and P. Dan Cook. Nucleic acids containing one or more carbocyclic sugars are also 
included within the definition of nucleic acids (see Jenkins et al. (1995), Chem. Soc. Rev. 
pp!69-l76). Several nucleic acid analogs are described in Rawls, C & E News June 2, 1997 
page 35. These modifications of the ribose-phosphate backbone may be done to facilitate the 
addition of additional moieties such as labels, or to increase the stability and half-life of such 
molecules in physiological environments. 

The term "heterologous" as it relates to nucleic acid sequences such as coding 
sequences and control sequences, denotes sequences that are not normally associated with a 
region of a recombinant construct, and/or are not normally associated with a particular cell . 
Thus, a "heterologous" region of a nucleic acid construct is an identifiable segment of 
nucleic acid within or attached to another nucleic acid molecule that is not found in 
association with the other molecule in nature. For example, a heterologous region of a 
construct could include a coding sequence flanked by sequences not found in associate 
with the coding sequence in nature. Another example of a heterologous coding sequence is a 

9 



PCTAJSOO/00445 

WO 00/40704 

construe, where ,he coding sequence itself is no, found in nature (e.g.. synthenc ~ = 

LTcodons differ from *. native gene,. S.mi.arly, a « — «* ' 
cIIL which is no. nonnally presen, in .he ho, ce„ would he centered herons for 

a PKS, an NRPS, «C. .s a nucleic aeid sequence which is 
Led in.o .ha. po.ypep.ide to *. and/or * - when placed under — of 
^.ereguia^^uences. I „cerUine m hod im e„ K ,u n e boundaries of «,ecod,ng 
ZL are determined h y a stari codon a. .he , (annuo) tenninus and a — 
» T„a, .he 3' (carboxy) terminus. A coding sequence can include, bu. ,s no. .0, 

„ ansliption .err.ina.ion sequence will usually he ,oca.ed V .0 fte cod,ng sequence. 

Expression "contro, sequences" refers coUectively .0 promo.er sequences, 
5 ribosome binding si.es, poiyadenyUtiou signals, .ranscripuon —ion sequences 
upstream regulatory domains, enhancers, and fte Ufa. which collectively prov.de for ft 
I" i P «o and .ransia.ion of a coding sequence in a ho, ceU. No. al, of ftese con.ro 
^Ilneedaiwaysbepresen.in.recon.hinan.vec.orsoiongasftedcs.redgene, 

capable of being transcribed and translated. f ™ 4orRN A 
••Recombination" refers to .he reassortment of sections of DMA or RNA 
sequences between two DNA or RNA molecules. "HomoWgous recombination" occurs 
Lleen two DNA molecules which hybridize by virtue of homologous or complementary 

25 refers to — under which a probe win hybridize preferential to i« Urge, 

subsequence and io a lesser ex.en,.o, or no. a. all <o, ofter sequences. Stirngen. 
hXo:.a„d.s«n g enthybrid ta ho,washco„di,ions. b ,hecon,e,o^ac,d 

hybridization experi m en B s„chasSo„.he r »a„dno rt her„hybr,d,z a ..ons are sequence 

andldiffere».u„der different en—tal parameters. An e*.ens,ve gu,de 

dependent, ana a f „„„HinTiissen(1993)i<ii""-<«°0'^'i" i 9» eI '" 
30 toftehybridizationofnucle.cac.ds.sfoundmT.jsscnf 993, 

El^ier NewYoric. Generally, highly stringent hybridization and wash eondU.ons are 
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at a defmed ionic strength and pH. The T m is the temperature (under defined ionic strength 
and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very 
stringent conditions are selected to be equal to the T m for a particular probe. 

An example of stringent hybridization conditions for hybridization of 
5 complementary nucleic acids which have more than 100 complementary residues on a filter 
in a Southern or northern blot is 50% formamide with 1 mg of heparin at 42°C, with the 
hybridization being carried out overnight. An examp.e of highly stringent wash condifons is 
0 15 M NaCl at 72°C for about 1 5 minutes. An example of stringent wash condihons is a 
0 2x SSC wash at 65°C for 15 minutes (see, Sambrook et al. (1989) Molecular Cloning - A 
10 Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor 
Press, NY, for a description of SSC buffer). Often, a high stringency wash is preceded by a 
low stringency wash to remove background probe signal. An example medium stringency 
wash for a duplex of, e.g., more than 100 nucleotides, is Ix SSC at 45<C for 15 minutes. An 
example low stringency wash for a duplex of, e.g., more than 100 nucleotides, is 4-6x SSC at 
15 40°Cfor ISminutes. In general, a signal to noise ratio of 2x (or higher) than that observed 
for an unrelated probe in the particular hybridization assay indicates detection of a specific 
hybridization. Nucleic acids which do not hybridize to each other under stringent condition, 
are still substantially identical if the polypeptides which they encode are substantially 
identical. This occurs, e.g.. when a copy of a nucleic acid is created using the maximum 
20 codon degeneracy permitted by the genetic code. 

A "library" or "combinatorial library" of polyketides and/or polypeptides is 
intended to mean a collection of polyketides and/or polypeptides (or other molecules) 
catalytically produced by a PKS and/or NRPS and/or hybrid PKS/NRPS (or other possible 
combination of synthetic elements) gene cluster. The library can be produced by a gene 
25 cluster that contains any combination of native, homolog or mutant genes from aromatic, 
modular or fungal PKSs and/or NRPSs. The combination of genes can be derived from a 
single PKS and/or NRPS gene cluster, e.g., act.fren, gra, torn, whiE, gris, ery, or the like, 
and may optionally include genes encoding tailoring enzymes which are capable of 
catalyzing the further modification of a polypeptide, polyketide, or other molecule. 
30 Alternatively, the combination of genes can be rationally or stochastically derived from an 
assortment of NRPS and/or PKS gene clusters. The library of polyketides and/or 
polypeptides and/or other molecules thus produced can be tested or screened for biological, 
pharmacological or other activity. 
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By "random assortment" is intended any combination and/or order of genes, 
homologs or mutants which encode for the various PKS and/or NRPS enzymes, modules, 
active sites or portions thereof derived from aromatic, modular or fungal PKS and/or NRPS 

gene clusters. 

5 By "genetically engineered host cell" is meant a host cell where the native 

PKS and/or NRPS gene cluster has been altered or deleted using recombinant DNA 
techniques or a host cell into which a heterologous PKS and/or NRPS and/or hybrid 
PKS/NRPS gene cluster has been inserted. Thus, the term would not encompass mutational 
events occurring in nature. A "host cell" is a cell derived from a procaryotic microorganism 

10 or a eucaryotic cell line cultured as a unicellular entity, which can be, or has been, used as a 
recipient for recombinant vectors bearing the PKS, NRPS, and/or hybrid gene clusters of the 
invention. The term includes the progeny of the original cell which has been transfected. It 
is understood that the progeny of a single parental cell may not necessarily be completely 
identical in morphology or in genomic or total DNA complement to the original parent, due 

15 to accidental or deliberate mutation. Progeny of the parental cell which are sufficiently 
similar to the parent to be characterized by the relevant property, such as the presence of a 
nucleotide sequence encoding a desired PKS, are included in the definition, and are covered 

by the above terms. 

Expression vectors are defined herein as nucleic acid sequences that are direct 
20 the transcription of cloned copies of genes/cDNAs and/or the translation of their mRNAs in 
an appropriate host. Such vectors can be used to express genes or cDNAs in a variety of 
hosts such as bacteria, bluegreen algae, plant cells, insect cells and animal cells. Expression 
vectors include, but are not limited to, cloning vectors, modified cloning vectors, specifically 
designed plasmids or viruses. Specifically designed vectors allow the shuttling of DNA 
25 between hosts, such as bacteria-yeast or bacteria-animal cells. An appropriately constructed 
expression vector preferably contains: an origin of replication for autonomous replication m 
a host cell, a selectable marker, optionally one or more restriction enzyme sites, optionally 
one or more constitutive or inducible promoters. In preferred embodiments, an expression 
vector is a replicable DNA construct in which a DNA sequence encoding a one or more PKS 
30 and/or NRPS domains and/or modules is operably linked to suitable control sequences 

capable of effecting the expression of the products of these synthase and/or synthetases in a 
suitable host. Control sequences include a transcriptional promoter, an optional operator 
sequence to control transcription and sequences which control the termination of 
transcription and translation, and so forth. ^ 
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A "bleomycin open reading frame", or "bleomycin ORF", or "BLM OrT 
refers to a nucleic acid open reading frame that encodes a polypeptide or polypeptide domain 
that has an enzymatic activity used in the biosynthesis of a bleomycin. 

A "PKSrtMRPS/PKS" system refers to a synthetic system comprising an NRPS 
flanked by two PKSs. A "NRPS/PKS/NRPS" system refers to a synthetic system comprising 
a PKS flanked by two NRPSs. A "hybrid PKS/NRPS system" or a "hybrid NRPS/PKS 
system" refers to a hybrid synthetic system comprising at least one PKS and one NRPS 
module. The system can comprise multiple modules and the order can vary. 

A "biological molecule that is a substrate for a polypeptide encoded by a 
bleomycin biosynthesis gene" refers to a molecule that is chemically modified by one or 
more polypeptides enccoded by open reading frame(s) of the Urn gene cluster. The 
"substrate" may be a native molecule that typically participates in the biosynthesis of a 
bleomycin, or can be any other molecule that can be similarly acted upon by the polypeptide. 

A "polymorphism" is a variation in the DNA sequence of some members of a 
15 species. A polymorphism is thus said to be "allelic," in that, due to the existence of the 
polymorphism, some members of a species may have the unmutated sequence (i.e. the 
original "allele") whereas other members may have a mutated sequence (i.e. the variant or 
mutant "allele"). In the simplest case, only one mutated sequence may exist, and the 
polymorphism is said to be diallelic. In the case of diallelic diploid organisms, three 
20 genotypes are possible. They can be homozygous for one allele, homozygous for the other 
allele or heterozygous. In the case of diallelic haploid organisms, they can have one allele or 
the other, thus only two genotypes are possible. The occurrence of alternative mutations can 
give rise to trialleleic, etc. polymorphisms. An allele may be referred to by the nucleot,de(s) 

that comprise the mutation. 
25 "Single nucleotide polymorphism" or "SNPs are defined by their 

characteristic attributes. A central attribute of such a polymorphism is that it contains a 
polymorphic site, "X," most preferably occupied by a single nucleotide, which is the site of 
the polymorphism's variation (Goelet and Knapp U.S. patent application Ser. No. 
08/145,145). Methods of identifying SNPs are well known to those of skill in the art (see, 

30 e.g., U.S. Patent 5,952,174). 

The following abbreviations are used herein:: A, adenylation; ACP, acyl 
carrier protein; AT, acyltransferase; BLM, bleomycin; C, condensation; Cy, 
condensation/cyclization; KR, ketoreductase; KS, ketoacyl synthase; MT, methyltransferase; 
NRPS, nonribosomal peptide synthetase; orf, open reading frame; Ox, oxidation; PCP, 
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peptidyl carrier protein; PCR, polymerase chain reaction; PKS, polyketide synthase; Sv, 
Streptomyces verticillus, ArCP, aryl carrier protein, bp, base pair, CoA, co-enzyme A, DTT, 
dithiothreitol; FAS, fatty acid synthase; kb, kilobase; PPTase, 4'-phospho P antetheinyl 
transferase; TCA, trichloroacetic acid; and DEBS, 6-deoxyerythronolide B synthase.. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1 A and IB illustrate the biosynthetic pathway for bleomycin in Sv 
(ATCC 15003). Figure 1 A illustrates a biosynthetic pathway for BLM in Sv ATCC15003- 
intermediates except those in brackets were identified. Figure IB shows a linear model for 
the Blm megasynthetase-templated assembly of the BLM peptide/polyketide/peptide 
aglycone from nine amino acids and one acetate-shaded circles represent atypical domains 
carrying out the proposed novel chemistry, and arrows with broken line indicate where 
biosynthetic intermediates were derailed. Three-letter amino acid designations were used. 

[HO], hydroxylation; [H], reduction. 

Figure 2 provides a restriction map and gene organization of the blm gene 
cluster from Sv ATCC15003 (B, BamW). Proposed functions for individual open reading 
frames are summarized in Tables I and II. Modules for individual NRPS and PKS were 
given along with their proposed substrates in parentheses. 

Figures 3 A, 3B, 3C, and 3D illustrate the determination of substrate 
specificity for NRPS-1 and NRPS-6. Figure 3A shows a comparison of the A3 to A6 region 
of A domains to 84 NRPS modules available at GenBank that activate various amino acids. 
Figure 3B shows a comparison of amino acid residues that putatively line the substrate 
binding pockets for A domains (single-letter amino acid designations were used). The 
number following the protein name indicates the order of a particular A domain in the 
multinodular NRPS protein. The protein accession numbers are P48663 (HMWP2), P19828 
(AngR), AAC06346 (BacA-2), CAB03756 (MbtB), 3510629 (SyrE-7), 31 14612 (AcmB-1), 
CAA67248 (SnbC-1), and 3560507 (FxbC-2). Dhb stands for 2,3-dehydroaminobutyric 
acid It is not known if Dhb is the direct substrate for SyrE-7 or resulted from dehydration of 
an SyrE-7 activated Thr (Guenzi * al. (1998) J. Biol. Chem. 273: 32857-32863). Figure 3C 
illustrates purified proteins after overexpression in E. coli as analyzed by electrophoresis on 
a 10% SDS-polyacrylamide gel (the calculated molecular weights for NRPS-1 A and NRPS- 
6A are 64,212 and 61,899, respectively). Figure 3D illustrates substrate specificities as 
determined by the ATP-PPi exchange reaction with the amino acids of BLM as substrates 
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(100% relative activity corresponds to 103,000 cpm for NRPS-1A and 256,000 cpm for 
NRPS-6A). 

Figure 4 illustrates a three-module NRPS/PKS/NRPS model for channeling 
the growing intermediate between NRPS and PKS modules and between PKS and NRPS 
5 modules. The KS, ACP, and C domains are shaded to emphasize their unique activities that 
are responsible for elongating a growing peptide with a short carboxylic acid and a growing 
polyketide with an amino acid in hybrid peptide/polyketide/peptide biosynthesis. 

Figure 5 illustrates the use of 6//*F///rnethyltransferase domain to introduce 
branched methyl groups in a polyketide synthesis. PCK12 has been described by Kao et al. 
10 (1995) J. Am. Chem. Soc, 7: 9105-9106. DE-1, DE-2 and DE-3 rae three representative 
products demonstrating the strategy and utility of blm VIII in introducing a CH 3 group in 

polyketide biosynthesis. 

Figure 6 illustrates the use of the blm NRPS and PKS enzymes to synthesize a 
variety of hybrid polyketide/peptide molecules including, but not limited to, a family of 
1 5 oxazolines/oxazoles, and thiazoline/thiazoles. 

Figure 7 illustrates the use of elements of the blm gene cluster to synthesize 

various sugars. 

Figure 8Ashows a restriction map of the blm gene cluster from Sv 
ATCC15003 (B, BamHI). 8B shows the relative position of the blml, blmll, and blmXI 
20 genes to the two' blmAB resistance genes {blm\ Blm resistance). Individual open reading 
frames are represented by open arrows. Figure 8C shows the nucleotide sequence of the 
blml gene. The potential ribosome-binding site (RBS) and the conserved motif for 4'- 
phosphopantetheinylation are underlined. The sequence has been deposited into GenBank 

under accession no. • 

25 Figure 9 shows an amino acid sequence comparison of Blml with PCP 

domains of known type I NRPSs (Grs-2 [P14688], 36% identity, 58% similarity; Srfa-3 
[Q08787], 40% identity, 64% similarity; Vir-s [Y11547], 36% identity, 60% similarity; Saf- 
b [U24657], 40% identity, 54% similarity). Given in brackets are nucleotide sequence 
accession numbers. The shaded letters indicate similar amino acids. Consensus residues are 

30 amino acids that are similar in more than three sequences. The signature motif for 4'- 
phosphopantetheinylation is underlined. 

Figures 10A and 10B shows the HPLC analysis of Blml purified from E. coli 
OG7001(pBS2) (Fig. 10A), and£ coli OG7001(pBS2/pDPT-Gsp) (Fig. 10B). 
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Figure 1 1 shows the enzyme architecture of type I and type II PKS and 
NRPS. A, adenylation domain; ACP, acyl carrier protein or ACP domain; AT, acyl 
transferase; C, condensation protein or C domain; KS, P-ketoacyl synthase domain; KSa, p. 
ketoacyl synthase a subunit; KSP, P-ketoacyl synthase P subunit; PCP, peptidyl carrier 

5 protein or PCP domain. 

Figure 12 illustrates the reaction catalyzed by phosphopantetheinyl 

transferases (PPTases). 

Figure 13 shows a restriction map and gene organization of XhtpptA locus 

from SvATCC 15003 

10 DETAILED DESCRIPTION 

Polyketides and polypeptides can be assembled in a remarkably similar 
. manner by repetitive addition of an extending unit to a growing chain by polyketide 
synthases (PKS) and nonribosomal peptide synthetase (NRPS) respectively. In the case of 
polyketides, the extending unit is typically a fatty acid (activated as an acyl CoA thioester) 

15 while the extending unit for polypeptides is typically an amino acid (activated as an 
aminonacyl adenylate). Both the PKS and NRPS systems have evolved a modular 
organization to define the number, sequence, and specificity of the incorporation of the 
extending unit and utilized the 4'-phosphopanththeine prosthetic group to channel the 
growing intermediate during the elongation process. 

20 This invention pertains to the discovery that a PKS-bound growing polyketide 

intermediate could be further elongated by an NRPS module, or conversely, a NRPS-bound 
growing polypeptide intermediate can be further elongated by a PKS module. This 
discovery permits the exploitation of NPRS, PKS, and hybrid NRPS/PKS systems to prov.de 
a number of novel hybrid peptide/polyketide metabolites from amino acids and short fatty 

25 acids. 

It was also a discovery of this invention that this hybrid NRPS/PKS/NRPS 
system is exemplified by the bleomycin (Blm) biosynthesis pathway in Streptomyces 
verticillus (Sv.) (ATCC 15003). The bleomycins are a family of glycopeptide-derived 
antibiotics originally isolated by Umezawa in 1996 from the fermentation broth of 5. 
30 verticillus. Bleomycins (BLMs) exhibit strong anti-tumor activity are currently used in the 
treatment of lymphoma, particularly Hodgkin's disease, testicular tumors, squamous cell 
carcinomas of skin, head, cervix, penis, rectum, and for intracavitary therapy of malignant 
effusions in ovarian and breast cancer. The commercial product, Blenoxane®, contains 
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BLM A2 and B2 as the principle constituents. Almost uniquely among anticancer drugs, 
BLM does not cause myelosuppression, promoting its wide application in combination 

chemotherapy. 

In one aspect, this invention provides a cloned and characterized BLM gene 
5 cluster consisting of characteristic NRPS and PKS genes from the Blm producer 

Streptoveticillum sp. (ATCC 15003). The cloned and isolated Blm gene cluster provides a 
method of recombinantly expressing bleomycin and/or bleomycin analogues. Thus, in one 
embodiment, this invention provides for nucleic acids encoding bleomycin synthetic 
machinery or subunits thereof, for cells recombinantly modified to express a bleomycin 
10 and/or bleomycin analogue, and for a bleomycin or bleomycinh analogue recombinantly 

expressed in such cells. 

Like other polyketide synthase or nonribosomal peptide synthetases, the 
bleomycin synthetic pathway is organized into modules, each module catalyzing the addition 
and/or modification of one subunit (e.g. fatty acid or amino acid). Each module is organized 
1 5 into a number of domains each domain having a characteristic activity (e.g. activation, 
condensation, condensation/cyclization, etc.). The catalytic domains within a module and 
the modules themselves are often arranged collinearly and the order of biosynthetic modules 
from NH 2 - to COOH-terminus on each PKS and NRPS polypeptide and the number and type 
of catalytic domains within each determine the order of structural and functional elements in 
20 the resulting product. The size and complexity of the ultimately formed product are 

controlled by the number of repeated acyl chain extension steps that are, in turn, a function 
of the number and placement of carrier protein domains in these multimodular enzymes. 
The number composition and order of such domains can be altered either to introduce 
modifications, e.g. into the bleomycin to produce bleomycin analogues, or to produce 
25 different or completely new molecules. Such "recombination" is not restricted solely to 

recombination among the bleomycin catalytic domains and/or modules, but can also involve 
recombination between beomycin modules and/or subunits and other PKS and/or NRPS 
modules and/or subunit. Moreover the discovery that synthetic pathways can incorporate 
both PKS and NRPS modules and/or catalytic domains makes available hybrid PKS/NRPS 
30 syntheses. 

Thus, in one embodiment this invention contemplates the use of blm gene 
cluster modules and/or catalytic domains to make various peptide and/or polyketide, and/or 
hybrid polypeptide/polyketide metabolites (including, but not limited to bleomycin 
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intermediates or shunt metabolites), in combinatorial biosynthesis with other polyketide 
synthases and/or other nonribosomal peptide synthetases. 

The blm gene cluster contains several glycosylases which can be used alone 
or in context with other PKS and/or NRPS modules or catalytic domains to make various 
5 metabolites with sugars associated with bleomycins (bleomycin sugars). 

In addition, the blm gene cluster includes a novel methyltransferase domain 
that can be used to make polyketide metabolites with methyl branch(s). 

The blm gene cluster also is characterized by the unusual Cy domains as well 
as the unprecedented Ox domain (**. e.g. BlmlV and Blmlll NRPSs), providing an efficient 
10 biosynthesis for a bithiazole structure. The blm gene cluster, blm modules, or blm catalytic 
domains can be used either individually or collectively (alone or in combinations with other 
nonribosomal peptide synthetases or polyketide synthases) to make thiazolidine, thiazohne 
and thiazole, bi-thiazolidine, bithiazoline, and bithiazole-containing microbial metabolites. 
Other uses include, but are not limited to the usage of the blm gene 
15 cluster/modules/catalytic units (either individually or collectively) or the Blm model to make 
heterocyclic ring-containing microbioal metabolites, such as five member S- and N- 
containing compounds of the thiazolidine, thiazoline and thiazole family or the O- and N- 
containing compounds of the oxazolidine, oxazoline, and oxazole family or to make sugars, 
such L-sugars (with the BlmG epimerase), sugars modified by carbamoyl group (with 

20 BlmD), and disaccharides. 

This invention also includes the discovery of a novel discrete PCP protein 
(encoded by the Blml gene). Apo-Blml can be efficiently modified into holo-Blml either in 
vivo or in vitro by PCP-specific 4'-phosphopantetheine transferases (PPTases) such as Gsp 
and Sfp. Unlike the PCP domains in type I NRPSs, blml lacks its cognate A domain and can 
25 be aminoacylated by Val-A, an A domain from a completely unrelated type I NRPS. Blml, 
therefore, represents the first characterized bype II PCP, providing the genetic and 
biochemical evidence to support the existence of a bype II NRPS. The latter system is 
useful, inamanner analogous to the type I NRPS,/.,. modular NRPS, in the combinatory 
manipulation of NRPS proteins to generate novel peptides. This invention also includes the 
discovery and characterize of a novel PPTase (encoded by the pptA gene in F.gure 13). 
This PPTase can be used in engineered biosynthesis of polyketides, peptides, hybrid peptide 
and polyketide metabolites, hybrid polyketide and peptide metabolites, or the combination of 
both types of metabolites. The PPTase can also be used in converting apo-pept.dyl earner 
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20 



proteins (both type I 
holo-proteins 



and type II) and acyl carrier proteins (both type 1 and type II) into the 



The Examples provided herein and the accompanying primers permit one of 
ordinary skill in the art to isolate the blm gene cluster of this invention, its constituent ORFs. 
5 various modules, or enzymatic domains. The isolated nucleic acid components can be used 
to express one or more polypeptide components for in vivo {e.g. recombinant) synthes.s of 
one or more polypeptides and/or polyketides as indicated above. It will also be appreciated 
that the blm cluster polypeptides can be used for ex vivo assembly of various 
macromolecules. 

10 I. BLM p""e cluster f>"* **** PPTase gene. 



A) The BLM pene cluster. 

The nucleic acids comprising the blm gene cluster are identified in Tables I 
and II and listed in the sequence listing provided herein (SEQ ID NOS: 1 and 2, GenBank 
Accession numbers AT-149091, AT-210249, AF2103 1 1). In particular, Table I identifies 
and functions of open reading frames (ORFs) responsible for the biosynthesis of the 
hybrid peptide/polyketide/peptide backbone and sugar moieties of bleomycin, while Table II 
identifies a number of ORFs comprising the blm gene cluster, identifies the activity of the 
catalytic domain encoded by the ORF and provides primers for the amplification and 

isolation of that orf. 

As illustrated in Example 1, the blm cluster comprises a PKS module, flanked 
by several NRPS modules along with several sugar biosynthesis genes and genes encoding 
other biosynthesis enzymes as well as several resistance and regulatory genes (Table 1). 



15 genes 



Table 



I Determined functions of ORFs in the bleomycin biosynthesis gene cluster 
- — t -r= ~~ 



Amino 
acids 



Sequence Homolog 




HMWP2 (P48633), McbC 
(P23185) 



Proposed function' 
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S£i | 1 1 7 J """" ' t Underlined domain s oonlain mollis nut are dearly 

dlfc «„, from blown NKPS or PKS domains. 3. This A domain .aokstta -ypica, N«PS A «U- 
A9 moUfs and more ctaly^mblea ,o»l CoAsynte., .« » 0*F7 were report by Sehm.d, (.994) 
Oat 1 5 1 :17-21 , wno .jsijned OEF2 as UmA and ORF4 as UmB. 

Noteworthy are the genes encoding the NRPS and PKS enzymes. The blmj. 
«/, and W^/genes encode NRPSs with a„ unusual architect. In contrast to all known 
NRPSs, which are of modular organization with each module consisting minimally o a 
condensation (C), a„ adenylate (A), and a peptidyl carder protem (PCP) domain *W 
JMZ and »M are discrete proteins homologous to individual domains of type I NRPSs. 
We have characterized Blml as a We II PCP (Du and Shen (1999) Okm Biol 6: 507-5 17). 
The « and BlmXI proteins can serve as candidates for type II condensation enzymes. 

The blmlll. UmlK blmV. blmVl. UmVII. blmlX, and IMf genes encode 
modular NRPSs consisting of domains characteristic for known type I NRPSs, such as the A, 
,5 PCP C, and condensadon/cyclization (Cy) domains, a, well as an unprecedented o*,dat,o„ 

(NRPS- 5) consists of an atypical A domain, which bears a close resemblance » a famt y of 
acyl CcA synthases (Fitzmaurice and Kolattukudy (1997)1 Bacterid. 179: 2608-2615, 
Maurice and Kolat.akudy<.998y. Biol. 0-.273: 8033-8039), and an acyl earner 
20 protein (ACP)-like domain Its C-terminal module is truncated and presumably mterac* 
with B,mV to constitute me complete NRPS-3 mcOuie (Fig. IB). Also noteworthy „ C 
domain of NRPS-3 that lacks both His residues of the conserved HHxxxDG (SEQ IE .NO 4) 
active site for mmspeptidation (Stachelhaus e, al. (1998) J. Biol. Chem. 273: 22773-22781) 
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and the extra C domain at the C-terminus of BlmV. These unusual features associated with 
BlmVI and BlmV may play roles in the formation of the p-aminoalaninamide and the 
pyrimidine moieties of BLM, which are unprecedented in peptide biosynthesis. 

The blmVIII gene encodes a PKS module consisting of domains characteristic 
for known PKSs, such as ketoacyl synthase (KS), acyltransferase (AT), ketoreductase (KR), 
and ACP, with malonyl CoA acting as an extending unit according to sequence comparison 
of the AT domain (Haydock et al. (1995) FEBS Lett. 374: 246-248) (Fig. IB). 

The identification of an integrated methyltransferase (MT) domain in the 
middle of Blm VIII is unique, representing the first PKS from actinomycetes that contains an 
internal MT domain. 



Table II. Blm gene cluster open reading frames (ORFs) and primers for ORF amplification. 



Orf# 


Position 


Activity 


Method 


Primers 

Forward 
Reverse 


Se 

q 

ID 

No 


orf-8 


76183- 
77457 


Oxygen-independent 
coproporphyrinogen 
III oxidase 


Gapped-blast 
comparison 1 


F: ATGAGCCACGCCATCGGA 
R: TCAGGCGCGTTCGGGGGC 


5 
6 


orf-9 


74690- 
76186 


ADP-heptose synthase 
(blmO 


Gapped-blast 
comparison 1 


F: GTGAACACCGACCTGCCC 
R : TCATGGGGTGTCTCCCTC 


7 
8 


orf- 
10 


74421- 
74693 


Peptidyl carrier 

protein 

(bind) 


Expression and 

biochemical 

characterization. 2 


F: ATGAGCGCCCCGCGGGGC 
R: TCACCGGTCCCGCTCCCC 


9 

10 


orf- 
11 


72787- 
74424 


Carbamyltransferase 
(blmD) 


Gapped-blast 
comparison 1 


F: ATGAGCGCCGACCCGTCC 
R: TCATGAGCGGGCCGCCGT 


11 
12 


orf- 
12 


71618- 
72790 


ADP-heptose:LPS 
heptosyl transferase 
(blmE) 


Gapped-blast 
comparison 1 


F: ATGACCACCCCCATGACC 
R: TCATGGGGTACTCCTGAT 


13 
14 


orf- 
13 


70983- 
71546 


Homolog of mbtH in 
the synthesis of 
mycobactin 


Gapped-blast 
comparison 1 


F: ATGACCACGACCCCGCGG 
R: TCAGGTGCCGGACACGCG 


15 
16 


orf- 
14 


69598- 
70986 


Peptide synthetase 
(condensation, blmll) 


Gapped-blast 
comparison 1 


F: GTGACCGCCCCCGGCACA 
R : TCATCGGTGGCTCCTCGT 


17 
18 


orf- 
15 


68582- 
69601 


Regulatory gene 
(homolog of syrP) 


Gapped-blast 
comparison 1 


F: GTGAACCGGCACGGCCCC 
R: TCACGCGCTCACCTCGTC 


19 
20 


orf- 
16 


65778- 
68585 


Mutated peptide 
synthetase- oxidase 
(NRPS-0, blmlll) 


Gapped-blast 
comparison' 


F: GTGACGAGCGCCCGGCCC 
R: TCACGGGGCCTCCGTGCG 


21 
22 


orf- 
17 


57901- 
65781 


Peptide synthetase 
(NRPS-2-l,W/n/F) 


Expression and 

biochemical 

characterization, 2 


F: ATGCTGCACGGCGCCGCG 
R: TCACTCCGGTCCACCTCC 


23 
24 
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; orf- 
i 18 


55899- . Asparagine synthetase i Gappea-blast : F: g^agg^^tctgcggc l « 
S7R1S : comparison' ! R: *CAGCw»CCw**G..CGC^ , 2o 


i orf- 

i 19 

j 


54418- ! Homolog of i Gapped-blast 
55902 ! hydroxylase- j comparison 
i dehvdroeenase iblmF) : 


F: GTGAAGGACCTCGGCCGG 
R: TCACTCCCCCGGTGCCGG 


27 
28 


i orf- ! 53427- « Nucleotide-sugar ; Gapped-blast 
! 20 ! 54404 ■ epimerase < comparison 
i j ! (blmG) = 


! F: GTGACATGGACCGTGGTG 
R: TCAGGCATCGGCCCTCCC 


29 | 

30 | 

i 


j orf- i 51493- Peptide synthetase ; Gapped-blast 
1 21 ! 53430 1 fNRPS-3CT. blmV) ' comparison' 


F : ATGCGCGGGCATGACGAC 1 3i | 
R: TCACGGTGTCTCTCCCTC 1 32 


1 orf- 1 43263- ' Peptide synthetase ' Expression and 

I 2^ '51290 1 (NRPS-5U-3, blmVI) \ biochemical 

i " j j 1 characterization." 


F: ATGAGCCGGCCGGCCGGC | 32 | 
R: TCATGCT CGGTCATCG C C 1 34 | 

i i 


i orf- 
23 

1 


39610- ! Peptide synthetase ! Expression and 
43266 ! (NRPS-6, blmVlI) ! biochemical ^ 
j ! characterization." 


F: GTGACCACGCCCCGCATC 1 35 1 
R: TCATTCGGGACGCGGGCA I 35 

1 ! 


orf- j 34088- - Polyketide synthase j Gapped-blast 
lb hQfiH \(hlmVim 1 comparison 


F: ATGAGCCATGCCGACGCG 
R: TCACAGCACCACCTCTTC 


37 
38 


orf- ' 30891- : Peptide synthetase \ Gappea-blast j f : atgaccccggccgcc^ 
->s U40Q1 fNRPS-7. blmlX) ! comparison' ! : -CATCG.CwG.cgccttt 


39 ! 

40 | 


orf- j 24406- Peptide synthetase j Gapped-blast 
-)f> i 10894 i rNRPS-9-8. blmX) ! comparison 


F: ATGCCTCGGTGTGCCCGA 1 41 | 
R: TCATTCGGCGGCACCTCC j 42 | 


orf- 
27 


22127- : Peptide synthetase j Gapped-blast 
24193 ! ('condensation. blmXD 1 comparison' 


F: GTGGGTTTCCGTCGAGCG 1 43 
R: TTACACCCTCCGTTTCTC j 44 


orf- 121367- ; Phosphatidylserine 
28 ' 22086 ■ decarboxylase 


Gapped-blast 
comparison 1 


F: ATGGCACAGGACCTGAAC 
R: TCAACGCCACCGGATCTT 


45 
46 


orf- | 19161- : Transmembrane j Gapped-blast 
9Q ! ?OQOQ 1 transDorter 1 comparison 


F: GTGAGCTCCCTCGCCGTC 
R: TCATCGTCGGGCACTCGG 


47 
48 


orf- j 18823- ! Metal dependent 
10 ! 19164 ' reeulatorv element 


Gapped-blast 
comparison 1 


F : GTGCCGGTTCCGCTGTAT 
R: TCACCGGGCACTGACCTC 


49 
50 


orf- i 18660- j PHNA homolog 1 Gapped-blast 
11 ! 18307 ^ comparison 


F: GTGACCGAGAACCTTCCG | 51 1 
R: TCAGACCTTCTTGACCAC | 52 j 


orf- : 17736- Peptide synthetase 1 Gapped-blasi j F: ^gcctca^cgctttg i ,3 j 
1 v 1 Q9ii i fNRPS-ll-10) 1 comparison" | ^ tcatt™c.cctcct. , 54 i 


orf- 
33 


9214- j Putative transporter j Gapped-blast 
7859 j | comparison 1 


F: ATGATGAAGTCAAGCCGC 
R: TCAGTGGCTTACAAGGAG 


55 
56 


orf- 
34 


7797- 
6784 


Homoiog of 
clavaminic acid 
synthase 


Gapped-blast 
comparison 1 


F: ATGACTGAC CTGCCGTTG 
R: TCACACCAGCAGCGAGGT 


57 
58 


orf- 
35 


6773- j Thioesterase 
6021 1 


Gapped-blast 
comparison 1 


F: ATGGATTTCCCCCTCACC 
R: TCATGCCCCTACCTCGGC 


59 
60 


orf- 
36 


6024- 
4741 


Putative transponer 


Gapped-blast 
comparison 1 


F: ATGACCGCGCGCGTCGAC 
R: TCACTCCTCGGCTTCGGC 


61 
62 


orf- 
37 


4733- i Unknown 
3915 i 


Gapped-blast 
comparison 1 


F: GTGTCCAAGAACGCGGCG 
R: TCATCGGCTCGCCTCGTG 


63 
64 


orf- 
38 


3918- 
2182 


Peptide synthetase 
(NRPS-I2) 


Gapped-blast 
comparison 1 


F: ATGACCCTCACCCTGCGG 
R: TCACTCGGGCACTCCTTC 


65 
66 


orf- 
39 


2185- 
1199 


Regulatory gene 
(homoloe of SvrP 


Gapped-blast 
comparison' 


F: GTGACCGGTTCCGTAACG 
R: TCATGAGTCCG CCGAGGT 


67 
68 


orf- 11015-1 1 Peptide synthetase 


Gapped-blast | F: ATGACAGAGGTCCGAGGT | 6 9 
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40 

orf- 

41 



comparison 



On a 

separate 

sequence 



4'- 

phosphopantetheinyl 
transferase (pptA) 



Expression and 

biochemical 

characterization. 



F: GTGATCGCCGCCCTCCTG 
R: TTACGGGACGGCGGTCCG 



71 
72 



The Blm megasynthetase comprises nine NRPS modules and one PKS 
module forming a hybrid NRPS/PKS/NRPS metasynthetase (Fig. 1 A). Inspection of the blm 
gene cluster (Fig. 2) showed that the Blm NRPS and PKS modules apparently are not 
5 organized according to the -colincarity rule" for BLM biosynthesis (Fig. 1). Detailed 

functional organization of the megasynthetase and the BLM synthetic pathway is provided in 
Example I. 

H) PPTase 

This invention also provides the gene (pptA, Fig. 13) encoding 
10 phosphopantetheine transferase (PPTase) (GenBank Accession No: AF2103 1 1) {see, SEQ ID 
NO: 3). PPTase converts carrier proteins for the growing acyl chain from inactive apo-forms 
to functional holo-forms by the covalent attachment of the 4 "-phosphopantetheine moiety of 
coenzyme A to a conserved serine residue of the carrier-protein substrate (see, e.g., Fig. 1 A). 

Using the sequence information provided herein (e.g. primer sequences and 
1 5 PPTase sequence) the PPTase nucleic acids can be routinely isolated according to standard 
methods (e.g. PCR amplification). Detailed protocols for the isolation of the PPTase are 

provided in Example 3. 

Other PPTases can be identified using the probes and primers illustrated in 
Example 3. Briefly, using a primer to the THC motif (5'-C GGC ATG GTC GGC TCC HTN 

20 CAN CAY TG -3', SEQ ID NO: 73, where H= C+A, N = A + C + T + G, Y - C + T, K G 
+ T R = A + G, W = T + A), and a primer designed around the typical C-terminal PPTase 
motif(eg. KEA-1: 5'-T GCA GCA GAA CAG GAG GCK NYC CCA NKG - 3', SEQ ID 
NO: 74, and KEA-2: 5'- TG GOT CAG CGG GTA CCA NRC YTT RWA - 3', SEQ ID NO: 
75), and using S. verticillus chromosomal DNA as template, the set of primers THC/KEA-2 

25 a probe can be amplified (about 250 bp), that specifically binds to a PPTase. Libraries of 
organisms comprising NRPS, PKS, and/or hybrid PKS/NRPS pathways can be probed for 
the presence of a PPTase sequence. Once hybridizing clones are identified, the PPTase 
sequence can be isolated according to standard methods well know to those of skill in the art 
(see, e.g., Example 3). 
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n l.nl.«on^ il I p-ir"— acids ' 

,„ one embodiment mis invention provides nucleic acids for .he rccombman, 
expression of a bleomycin. Such nucleic acids include iso.a t ed gene c,us.er(s> compnsmg 

z^m " ,o ^ ,he assem y r yc,n ' 

, mo.herembodin.en.sof.hisinven.ion.modif.edM.omyc^^.b.eomycn 

analogs) novo, po,yke.ides, P oiype^^^ 

^fc are Jted by edifying PKSs and/or NKPSs so as to introduce vana,,ons .n o 
for exalp.e to modi., a known mdecu, in a specific way, ,g. by replacmg a argle 

;led S ,ruc,ure. ^^.^^^^.^^ 
„b,ary of mo.ecular variants of a known polymer by systematically or hapha^rd.y 

collection of af.ema.ive modules or domains. Production of a,.ema„ve/mod,fed PKSs, 

15 NRPSs and hybrid systems is described below. 

tJsingUteprimerandsequenceinforma.ionprovidedherem.oneofordrnary 

domains described herein. For example, the ^ R pHmersprov,ded,„Tab c»ca„ 
oe used to amplify any of the orfs identified therein. Moreover, usrng the sequence 
20 nation for the to gene cluster provided herein, the design of other pnme. smtab.e 
,heamplificataofindividualOPJs,combi»ationsofORFs,genes,«.. S rou.me. 

' Typically such amp.ifica.ions wil. utilize the DNA ofan organism coning 
tne requisite genes (e., ***** — > * ■ ^ 

polymerase, 1 x buffer, with or without 20% glycerol in a final volume of 

scheme as follows: i mtia.dena W ringa.9 4 "Cfor 5 min,2«6cyc.esof4 se a, C 
ri » a, 60-C, 2 min a, WC Mowed by additional 7 min a, 72°C. One ***** 

US. Pa.en.No. 4,683,202; 

Academic Press .nc. San Diego, CA, erc). In addiuon, pnmer may be des.gn 
l ice restriction sites and so facilitate cloning of the amplified sequence ,n.o a vector. 
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Using the information provided herein other approaches to cloning the desired 
sequences will be apparent to those of skill in the art. For example, the PKS or NRPS 
modules or enzymatic domains of interest can be obtained from an organism that expresses 
the same, using recombinant methods, such as by screening cDNA or genomic libraries, 
derived from cells expressing the gene, or by deriving the gene from a vector known to 
include the same. The gene can then be isolated and combined with other desired NRPS 
and/or PKS modules or domains, using standard techniques. If the gene in question is 
already present in a suitable expression vector, it can be combined in situ, with, e.g., other 
PKS subunits, as desired. The gene of interest can also be produced synthetically, rather 
than cloned. The nucleotide sequence can be designed with the appropriate codons for the 
particular amino acid sequence desired. In general, one will select preferred codons for the 
intended host in which the sequence will be expressed. The complete sequence can be 
assembled from overlapping oligonucleotides prepared by standard methods and assembled 
into a complete coding sequence (see, e.g., Edge (1981) Nature 292:756; Nambair et al. 
(1984) Science 223: 1299; Jay et al. (1984) J. Biol Chem. 259:63 1 1). In addition, it is noted 
that custom gene synthesis is commercially available (see, e.g. Operon Technologies, 
Alameda, CA). 

Examples of such techniques and instructions sufficient to direct persons of 
skill through many cloning exercises are found in Berger and Kimmel (1989) Guide to 
Molecular Cloning Techniques, Methods in Enzymology 152 Academic Press, Inc., San 
Diego, CA (Berger); Sambrook et al. (1989) Molecular Cloning - A Laboratory Manual (2nd 
ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY; Ausubel (19 
1994) Current Protocols in Molecular Biology, Current Protocols, a joint venture between 
Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., U.S. Patent 5,017,478; and 
European Patent No. 0,246,864. 

If. Expression of him gene d eters, modules and enzymatic domains. 

As indicated above, in one embodiment this invention provides novel NRPS 
and PKS genes for the efficient recombinant production of both novel and known 
polyketides, peptides, and polyketide/polypeptide hybrids by expressing them in vivo. In 
other embodiments, such syntheses are carried out in vitro. Even in vitro syntheses, 
however, typically utilize recombinant^ expressed PKSs, NRPSs, or enzymatic domains 
thereof. Thus, it is frequently desirable to express protein components of the PKSs or NRPs 
described above. 

25 
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Typically expression of the protein components of the pathway and/or of the 
products of the NRPS/PKS pathway is accomplished by placing the subject PKS or NRPS 
nucleic acid(s) in an expression vector, and transfecting a cell with the vector such that the 
cell expresses the desired product(s). 

5 A) Expression vectors 

The choice of vector depends on the sequence(s) that are to be expressed. 
Any transducible cloning vector can be used as a cloning vector for the nucleic acid 
constructs of this invention. However, where large clusters are to be expressed, ,t 
phagemids, cosmids, Pis, YACs, BACs, PACs, HACs or similar cloning vectors be used for 
10 cloning the nucleotide sequences into the host cell. Phagemids, cosmids, and BACs, for 
example, are advantageous vectors due to the ability to insert and stably propagate therem 
larger fragments of DNA than in M13 phage and lambda phage, respectively. Phagemxds 
which will find use in this method generally include hybrids between plasnnds and 
filamentous phage cloning vehicles. Cosmids which will find use in this method generally 
1 5 include lambda phage-based vectors into which cos sites have been inserted. Recent pool 
cloning vectors can be any suitable plasmid. The cloning vectors into which pools of 
mutants are inserted may be identical or may be constructed to harbor and express different 
genetic markers (see, e.g., Sambrook et al, supra). The utility of employing such vectors 
having different marker genes may be exploited to facilitate a determination of successful 
20 transduction. 

In preferred embodiments of this invention, vectors are used to introduce 
PKS NRPS or NRPS/PKS genes or gene clusters into host (e.g. Streptomyces) cells. 
Numerous vectors for use in particular host cells are well known to those of skill in the art. 
For example described in Malpartida and Hopwook, (1984) Nature, 309:462-464; Kao et al 
25 (1994), Science, 265: 509-512; and Hopwood et al, (1987) Methods Enzymol., 1 53: 1 16-166 
all describe vectors for use in various Streptomyces hosts. 

In a preferred embodiment, Streptomyces vectors are used that include 
sequences that allow their introduction and maintenance in E. coli. Such Streptomyces/E 
coli shuttle vectors have been described (see, for example, Vara et al. (1989) J. Bactenol, 
30 171-5872-5881; Guilfoile & Hutchinson (1991) Proc. Natl. Acad. Sci. USA, 88: 8553-8557.) 

The gene sequences, or fragments thereof, which collectively encode a PKS 
and/or NRPS and/or PKS/NRPS gene cluster, one or more ORFs, one or more modules, or 
one or more enzymatic domains of this invention, can be inserted into one or more 
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expression vectors, using methods known to those of skill in the art. Expression vectors will 
include control sequences operably linked to the desired .NRPS and/or PKS coding sequence 
or fragment thereof. Suitable expression systems for use with the present invention mclude 
systems that function in eucaryotic and prokaryotic host cells. However, as explained above, 
5 prokaryotic systems are preferred, and in particular, systems compatible with Sireptomyces 
spp are of particular interest. Control elements for use in such systems include promoters, 
optionally containing operator sequences, and ribosome binding sites. Particularly useful 
promoters include control sequences derived from PKS and/or NRPS gene clusters, such as 
one or more act promoters. However, other bacterial promoters, such as those denved from 
10 sugar metabolizing enzymes, such as galactose, lactose Qac) and maltose, will also find use 
in the present constructs. Additional examples include promoter sequences derived from 
biosynthetic enzymes such as tryptophan {trp), the beta -lactamase {bid) promoter system, 
bacteriophage lambda PL, and T5. In addition, synthetic promoters, such as the tac promoter 
(U.S. Patent 4,551 ,433), which do not occur in nature also function in bacterial host cells. In 
1 5 Sireptomyces, numerous promoters have been described including constitutive promoters, 
such as ermE and tcmG (Shen and Hutchinson, (1994) J. Biol. Chem. 269: 30726-30733), as 
well as controllable promoters such as actl and actlll (Pleper et al, (1995) Nature, 378: 263- 
266; Pieper et al., (1995) J. Am. Chem. Soc, 1 17: 1 1373-11374; and Wiesmann et al, (1995) 

Chem. & Biol. 2: 583-589). 
20 Other regulatory sequences may also be desirable which allow for regulation 

of expression of the PKS replacement sequences relative to the growth of the host cell. 
Regulatory sequences are known to those of skill in the art, and examples include those 
which cause the expression of a gene to be turned on or off in response to a chemical or 
physical stimulus, including the presence of a regulatory compound. Other types of 
25 regulatory elements may also be present in the vector, for example, enhancer sequences. 

Selectable markers can also be included in the recombinant expression 
vectors. A variety of markers are known which are useful in selecting for transformed cell 
lines and generally comprise a gene whose expression confers a selectable phenotype on 
transformed cells when the cells are grown in an appropriate selective medium. Such 
30 markers include, for example, genes that confer antibiotic resistance or sensitivity to the 
plasmid. Alternatively, several polyketides are naturally colored and this characteristic 
provides a built-in marker for selecting cells successfully transformed by the present 
constructs. 
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The variousPKS and/or NRPS clusters or subunits of interest can becloned 

u- , v^tors as individual cassettes, with separate control elements, 
into one or more recombmant vectors as uid v, ^ ^ 

or under the control of, e.g.. a single promoter. The PKS and/or N 

.din. PKS- or NRPS-encoding gene clusters, in cells including Strep^yces are well 

Sci «« 84- 4445-4449; Grim « a/. (1994) Gene, 151: 1-10; Kao « <■'■ ('« *«~ 
« 509'.512;andHopwood^/.(19 8 7)M«».^«'.. I» U6-K6). *"■» 
examples nucleic acid seouences of well over lOOkb have been introduced into cells, 

998) cLLcs, 52: 1-S; Woon e, «, «— * Huang < *0» a, 

L 24: 4202-4209). In addition, the cloning and overexpress.on of NRPS 1 

and NRPS-6 is illustrated in Example 1. 

ln certain embodiments mis invention may make use of genet.ca.ly 

KCorNKPSgenessubstantiaUydeieted. These bos, celis can b— --h 

• ^,^fPK-<? and/or NRPS gene clusters, for the 
recombinant vectors, encoding a variety of PKS and/or rsnr g 

■5 BLMs or other hybrid polyketide/peptid. metabolites so produced can be used th 

xl, several of .he r*lyke,ides and peptides produced by the present method w. fed 

hacterial and parasitic infections. The abihty to recombir-y produce po. *des .d 
peptides also provides a powcrntl tool for characterizing PKSs and/or NRPSs and the 



30 

mechanism of their actions 
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R ) Host cells. 

The vectors described above can be used to. express various protein 
components of the polyketide and/or polypeptide synthetic modules for subsequent isolation 
and/or to provide a biological synthesis of one or more desired biomolecules (e.g 
5 polyketides, peptides, etc.). Where one or more proteins of the blm cluster are expressed 
(e g. overexposed) for subsequent isolation and/or characterization, the protems are 
expressed in any prokaryotic or eukaryotic cell suitable for protein expression. In one 
preferred embodiment, the proteins are expressed in E. coli. Overexpression of blmlm E. 

coli is described in Example 2. 
1 o Host cells for the recombinant production of the subject polyketides can be 

derived from any organism with the capability of harboring a recombinant PKS, NRPS or 
PKS/NRPS gene cluster. Thus, the host cells of the present invention can be derived from 
either prokaryotic or eucaryotic organisms. However, preferred host cells are those 
constructed from the actinomycetes, a class of mycelial bacteria which are abundant 
15 producers of a number of polyketides and peptides. A particularly preferred genus for use 
with the present system is Streptomyces. Thus, for example, S. verticillus S. ambofaciens, S. 
avermitilis, S. azureus, S. cinnamonensis, S. coelicolor, S. curacoi, S. erythraeus, S.fradtae, 
S galilaeus, S. glaucescens, S. hygroscopicus. S. lividans, S. parvulus, S. peucetius. S. 
rimosus, S. roseofulvus, S. thermotolerans, S. violaceoruber, among others, will provide 
20 convenient host cells for the subject invention, with 5. coelicolor being preferred (w* e.g., 
Hoptoad, D. A. and Sherman, D. H. Ann. Re, Genet. (1990) 24:37-66; O'Hagan, D. The 
Polyketide Metabolites (Ellis Horwood Limited. 1991), for a description of various 
polyketide-producing organisms and their natural products.) 

In a preferred embodiment, the above-described cells are genetically 
25 engineered by deleting one or more naturally occurring PKS and/or NRPS genes therefrom, 
using standard techniques, such as by homologous recombination, {see, e.g., Khosla, et al. 

(1992) Molec. Microbiol. 6: 3237). 

In certain embodiments, a eukaryotic host cell is preferred (e.g. where certain 
glycosylation patterns are desired). Suitable eukaryotic host cells are well known to those of 
30 skill in the art. Such eukaryotic cells include, but are not limited to yeast cells, insect cells, 
plant cells, fungal cells, and various mammalian cells (e.g. COS, CHO HeLa cells lines and 
various myeloma cell lines) 



29 



WO 00/40704 PCT/USOO/00445 
P ) Protein/pnlyketide recovery. 

Polypeptide and/or polyketide recovery is accomplished according to standard 
methods well known to those of skill in the art. Thus, for example where blm cluster 
proteins are to be expressed and isolated, the proteins can be expressed with a consent tag 
to facilitate isolation (e.g. a His 6 ) tag. Other standard protein purification techn.ques are 
suitable and well known to those of skill in the art (see. e.g.. Quadn et al. (1998) 
Biochemistry*: 1585-1595; Hakano**/. (1992)** Gen. Genet. 232: 313-321, etc.). 

Similarly where components (e.g. modules and/or enzymatic domains) of the 
blm cluster are used to express various biomolecules (e.g. polyketides, sugars, polypeptides, 
etc ) the desired product and/or shunt metabolite(s) are isolated according to standard 
methods well know to those of skill in the art (see, e.g.. Carreras and Khosla (1998) supra.) 
Purification and in vitro reconstitute of the essential protein components of an aromatic 
polyketide synthase. Biochemistry 37: 2084-2088, Deutscher (1990) Methods in Enzymology 
Volume 182: Guide to Protein Purification, M. Deutscher, ed. . 

TTT. Synthesis of re™ mhinant bleomycins. 

In one embodiment this invention provides methods of synthesizing 
bleomycins and recombinant* synthesized bleomycin, As indicated above, this is generally 
accomplished by providing an organism (e.g. a bacterial cell) containing sufficient 
compoents of the blm gene cluster to direct synthesis of a complete bleomycin. 

In one embodiment, the entire blm cluster is cloned into a Streptomyces strain 
(,g., S. IMdans or S. coelicolor). Kao et 4(1994) Science, 265: 509-512, have cloned the 
30 kb DEBS genes from Sacc. erythmea into S. coelicolor and produced 6- 
deoxyerythronolide B in S. coelicolor and these methods can be used construct an expression 
plasmid for heterologous expression of the blm cluster. This method involves the transfer of 
DNA between a temperature-sensitive plasmid and a shuttle vector by means of a 
homologous double recombination event in E. coli (Id.). In a preferred embodiment, the two 
ends spanning the blm cluster are cloned into a temperature-sensitive plasmid that ,s 
chloramphenicol resistant (CM R ) such as pCK6. 5. Villus DNA is then rescued from a 
donor into the temperature-sensitive recipient by co-transforming E. coli with the Cm 
) recipient plasmid and the apramycin resistant (Ap R ) P KC505 donor cosmid that contains the 
blm gene cluster, followed by chloramphenicol and apramycin selection at 30°C. Colonies 
harboring both plasmids (Cm R , Ap R ) will be shifted to 44'C on chloramphenicol and 
apramycin plates and only those cointegrates formed by a single recombinat.on event 
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between the two plasmids are viable. Surviving colonies are then propagated at 30°C on 
Cm R plates to select for recombinant plasmids formed by the resolution of comtegrates 
through a second recombinant event. The desired blm cluster is cloned into the Cm 
temperature-sensitive plasmid and is ready to be moved into any expression plasrmd by a 
similar means of homologous recombinant event. 

For example, if P WHM861 is the choice of shuttle plasmid for the expression 
of the blm cluster in S. lividans (Meurer and Hutchinson (1995) J. Bacterial, 111: 477-481), 
the two ends spanning the blm cluster downstream of the ErmE* promoter in the ampicillin 
resistant (AM R ) plasmid P WHM861 are cloned. The resulting plasmid is co-transformed 
with the temperature-sensitive plasmid containing the blm cluster described above into E. 
coli under the selection of chloramphenicol and ampicillin at 30°C. These Cm* and AM 
colonies are shifted to 44°C on chloramphenicol and ampicillin plates to undergo a single 
recombination event and the surviving colonics are resolved on ampicillin plates at 30°C by 
completing the double recombination process. The resulting plasmid is suitable for 
transformation into S. lividans by selection of thiostrepton, in which the expression of the 
desired blm cluster is under the control of the ErmE* promoter. The S. lividans 
transformants are cultured and any metabolites produced are isolated and characterized. 

Once production of BLM in S. lividans is established, mutated alleles of the 
blm synthetase can be introduced into the blm cluster for the production of BLM analogs. 

TV. Altered endogenous expression nf hleomvcins. 

Using the Blm gene cluster information provided herein, one of skill in the art 
may regulating the synthesis of endogenous bleomycin. The expression of various ORFs 
comprising the blm gene cluster may be increased or decreased to alter bleomycin synthests 

levels. 

Methods of altering the expression of endogenous genes are well known to 
those of skill in the art. Typically such methods involve altering or replacing all or a portion 
of the regulatory sequences controlling expression of the particular gene that is to be 
regulated. In a preferred embodiment, the regulatory sequences {e.g., the native promoter) 
upstream of one or more of the blm ORFs are altered. 
) This is typically accomplished by the use of homologous recombinauon to 

introduce a heterologous nucleic acid into the native regulatory sequences. To downregulate 
expression of one or more blm ORFs, simple mutations that either alter the reading frame or 
disrupt the promoter are suitable. To upregulate expression of the blm ORF(s) the nat,ve 
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promoter(s) can be substituted with heterologous promoters) that induce higher than normal 

levels of transcription. 

In a particularly preferred embodiment, nucleic acid sequences comprising the 
structural gene in question or upstream sequences are utilized for targeting heterologous 

recombination constructs. 

The use of homologous recombination to alter expression of endogenous 
genes is described in detail in U.S. Patent 5,272,071, WO 91/09955, WO 93/09222, WO 
96/2941 1, WO 95/31560, and WO 91/12650. 

V. Synthesis of BLM analogs. 

In one one embodiment, this invention provides methods of synthesizing 
modified bleomycins or bleomycin analogs. In preferred embodiments, the BLM analogs are 
synthesized either by introducing specific perturbations into individual NRPS and/or PKS 
enzymatic domains or modules, or by reprogramming the linear order in which the NRPS or 
PKS enzymatic domains and/or modules appear in the blm synthetase genes. The former 
will lead to BLM analogs with targeted modifications at the BLM backbone and the latter 
will allow incorporation of other extension units in variable sequence into the biosynthesis of 
BLM. In particularly preferred embodiments, the genetically modified blm synthetases are 
produced in S. verticilus, however, it will be recognized that the entire blm gene cluster can 
be cloned into other hosts, e.g. into S. lividans or S. coelicolor. 

In preferred embodiments modification of the blm gene cluster to yield BLM 
analogues is accomplished by one of two different approaches. In one approach, the BLM 
enzymatic domains and/or modules modules are altered in a directed manner {i.e. they are 
changed in a preselected way), while in another approach, random/haphazard alterations are 
introduced into the blm cluster and the resulting products are screened to identify those with 
desired properties. 

a ) Synthesis of BT.M analogs b y «p"-ific engineering of the blm synthetase 
genes. 

The blm synthetase genes can be re-engineered by means of specific 
mutations or by reprogramming the linear order of the NRPS or PKS enzymatic domains or 
modules. In this approach, a wild-type blm synthetase allele is replaced with these mutants 
in and expressed in an appropriate host (e.g., S. verticillus or in a heterologous host). Since 
both NRPSs (Stachelhaus et al. (1 995) Science, 269: 69-72) and PKSs (Donadio et a!. (1993) 
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Proc. Natl. Acad. Sci. USA. 90: 71 19-7123, Donadio etal. (1995)7. Am., Chem. Sac. 117: 
9105-9106, Cortes et al. (1995) Science. 268: 1487-1489) have shown considerable tolerance 
to .programming, it is expected that these modifications of the BLM synthetase will resu.t 
in the production of BLM analogs with predicted structural alterations. For example, 
targeted modification at the (2S > 3S,4R)-4-amino-3-hydroxy-2-methyl/pentanoic acid AHM 
moiety of BLM can be accomplished by introduction of mutations into the BLMVIII PKS 
module of the BLM synthetase locus, motivation of the MX or KR motif by in-frame 
deletion or site-directed mutagenesis will result in the production of BLM analogs containmg 
a demethyl-AHM, oxo-AHM, or oxo-demethyl-AHM moiety, etc. 

Alternatively, individual functional NRPS domains and/or the PKS module 
can be deleted or the PKS module can be duplicated in-frame to produce BLM analogs with 
shorter or longer backbone, respectively. Alternatively, or in addition, the NRPS domams or 
the PKS module can be rearranged for the production of BLM analogs with a completely 
different backbone. The NRPS and PKS features can be combined into one integrated 
system, providing access to a structural variation not available by either the NRPS or PKS 
system alone. 

To create such mutations, plasmids are constructed carrying in-frame 
deletions of DNA segments encompassing a portion of the blm synthetase activities. 
Construction ofspecific deletions is preferably accomplished by oneof the following two 
strategies. The first involves subcloning of a DNA fragment in a gene replacement vector 
selection of two restriction sites suitably located at the two ends of the DNA segments and 
deletion of this segment from within the plasmid by rejoining the two resulting ends. An 
frame deletion can be obtained by a suitable combination of Klenow filling and SI treatment 

of both ends prior to ligation. ' 

The secondapproach involves polymerase chain reaction (PCR) ampl.f,ca.,on 
of two DNA segments that separate the region to he deieted HM by joining of the two 
fragments in the correct orientation in a gene replacement vector. TTis can he accomptished 
by designing PCR primers with suitable restriction sites. The restriction s,.e used to generate 
the deietion and the sequences to serve as tempia.es for the PCR amp.if.ca.ion are chosen so 
30 as to generate two segments of Urn synthetase DNA of approximately equal length .n the 
construction in order to maximize the chance of gene replacement. Tne gene replacement 
vector containing the allelic or deletion mutation is introduced into a Srep^es stra.n 
( efr . S ««ta7te). In.egra.ionofmeplasmidin.o.heS.verl/cHtechromosomev.aa 
single reciprocal homologous recombination will yield a recombinant that will be isolated by 
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selection for the vector marker. The resulting integrants are then grown under non-selective 
conditions and further resolution by selection for the loss of the vector marker via the second 
homologous recombination event will produce the desired deletion mutants. 

Southern analysis of the isolated deletion mutants with the target DNA is 
performed to ensure that the expected double crossover recombination event has taken place. 
The first approach is convenient if there are suitably spaced restriction sites in the DNA 
sequence. The second approach enables the deletion of any DNA segment but may be 
limited by the size of the DNA segments that can be amplified by PGR. These S. verticillus 
recombinants are cultured under typical conditions for BLM production and the fermentation 
broth is screened for the production of any novel BLM analogs resulted from the specific 
mutations in the blm synthetase locus. 

m cy.rt.~lt nf BLM hv "rando m " modification of Urn synthetase 
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Bleomycin analogs can also be synthesized by randomly/haphazardly altering 
genes in the BLM cluster expressing the products of the randomly modified megasynthetase 
and then screening the products for the desired activity. Methods of "randomly" altering blm 
cluster genes are described below. 

VI. Generation of oth" »- «Y nthfttle systems. 

In addition to the production of bleomycin or modified bleomycins, the blm 
gene cluster or elements thereof can be used by themselves or in combination withNRPS 
and/or PKS modules and/or enzymatic domains of other PKS and/or NRPS systems to 
produce a wide variety of compounds including, but not limited to various polykeudes, 
polypeptides, polyketide/polypeptide hybrids, various oxazoles and thiazoles, vanous sugars, 
various methylated polypeptides/polyketides, and the like. As with the production of 
modified bleomycins described above, such compounds can be produced, in vivo or « vUro, 
by catalytic biosynthesis using large, modular PKSs, NRPSs, and hybrid PKS/NRPS 
systems. The megasynthetases directing such syntheses can be rationally designed e.g. by 
predetermined alteration/modification of polyketide and/or polypeptide and/or hybnd 
PKS/NRPS pathways. Alternatively, large combinatorial libraries of cells harboring vanous , 
megasynthetases can be produced by the random modification of particular pathways and 
then selected for the production of a molecule or molecules of interest. It will be appreciated 
that, in certain embodiments, such libraries of megasynthetases/modified pathways, can be 
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used to generate large, complex combinatorial libraries of compounds which themselves can 
be screened for a desired activity. 

a- | Directed m nriifiration of hiom lecules. 

Elements (e.g. open reading frames) of the blm biosynthetic gene cluster 
and/or variants thereof can be used in a wide variety of "directed" biosynthetic processes (i.e. 
where the process is designed to modify and/or synthesize one or more particular preselected 
metabolites)). Polypepitdes encoded by particular open reading frames or combmations of 
open reading frames can be utilized to perform particular chemical modifiers of 

biological molecules. 

Thus, for example, open reading frames encoding a polypeptide synetase can 
be used to chemically modify an amino acid by coupling it to another amino acid. In another 
example, the methyl transferase nBbnVWun be utilized to introduce methyl groups mto 
polyketides, and other, substrates. The glycosyl transferases can be used to glycosylate 
appropriate substrates, and so forth. These examples, are merely illustrative. One of *dl in 
the art utilizing the information provided here, can perform literally countless chemical 
modifications and/or syntheses using either "native" bleomycin biosynthesis metabohtes as 
the substrate molecule, or other molecules capable of acting as substrates for the particular 
enzymes in question. Other substrates can be identified by routine screening. Methods of 
screening enzymes for specific activity against particular substrates are well known to those 

20 of skill in the art. 

The biosyntheses can be performed in vivo, e.g. by providing a host cell 
comprising the desired bhn gene cluster open reading frame(s) and/or in vivo, e.g., by 
providing thepolypeptidesencodedbythe^geneclusterORFsand the appropnate 

substrates and/or cofactors. 

Directed ^n pin^ring of • """»' «Y nth * tic Pathways. 
In numerous embodiments of this invention, novel polyketides, polypeptides, 
and combinations thereof are created by modifying known PKSs or NRPSs so as to introduce 
variations into known polymers synthesized by the enzymes. Such variations may be 
introduced by design, for example to modify a known molecule in a specific way, e.g. by 
replacing a single monomeric unit within a polymer with another, thereby creatmg a 
derivative molecule of predicted structure. Such variations can also be made by adding one 
or more modules to a known PKS or NRPS, or by removing one or more module from a 
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known PKS or NRPS. Such novel PKSs or NRPSs can readily be made using a variety of 
techniques, including recombinant methods and in vitro synthetic methods. 

Using any of these methods, it is possible to introduce PKS domains into a 
NRPS, or vice versa, thereby creating novel molecules including both peptide and polyketide 
structural domains. For example, a PKS enzyme producing a known polyketide can be 
modified so as to include an additional module that adds a peptide moiety into the 
polyketide. Novel molecules synthesized using these methods can be screened, using 
standard methods, for any activity of interest, such as antibiotic activity, effects on the cell 

cycle, effects on the cytoskeleton, etc. 

Novel polyketides, polypeptides, or combinations thereof can also be made by 
creating novel PKSs or NRPSs de novo, using recombinant or in vitro synthetic methods. 
Such novel arrangements of domains can be designed, i.e. to create a specific polymer. In 
addition to creating novel PKSs or NRPSs by combining modules, the methods of this 
invention can also be used to make novel modules that can add new monomeric units to a 
growing polypeptide or polyketide chain. Because the identity of each module, and, 
consequently, the identity of the monomer added by the module, is determined by the 
identity and number of the functional domains comprising the module, it is possible to 
produce novel monomeric units by creating novel combinations of functional domains within 
a module. Such novel modules can be created by design, for example to make a specific 
module that will add a specific monomer to a polyketide or polypeptide, or can be created by 
the random association of domains so as to produce libraries of novel modules. Such novel 
modules can be made using recombinant or in vitro synthetic means. 

Mutations can be made to the native NRPS and/or PKS subunit sequences and 
such mutants used in place of the native sequence, so long as the mutants are able to function 
with other PKS and/or PKS subunits to collectively catalyze the synthesis of an identifiable 
polyketide and/or polypeptide. Such mutations can be made to the native sequences using 
conventional techniques such as by preparing synthetic oligonucleotides including the 
mutations and inserting the mutated sequence into the gene encoding a NRPS and/or PKS 
subunit using restriction endonuclease digestion, (see. e.g., Kunkel, (1985) Proc. Natl. Acad. 
Sci. USA 82: 448; Geisselsoder et al. (1987) BioTechniques 5: 786). Alternatively, the 
mutations can be effected using a mismatched primer (generally 10-20 nucleotides in length) 
which hybridizes to the native nucleotide sequence, at a temperature below the melting 
temperature of the mismatched duplex. The primer can be made specific by keeping primer 
length and base composition within relatively narrow limits and by keeping the mutant base 
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centrally located (Zoller and Smith (1983) Meth, Enzymol. 100: 468). Primer extension is 
effected using DNA polymerase, the product cloned and clones containing the mutated 
DNA derived by segregation of the primer extended strand, selected. Selection can be 
accomplished using the mutant primer as a hybridization probe. The technique is also 
applicable for generating multiple point mutations (see, e.g., Dalbie-McFarland et al. (1982) 
Proc. Natl. Acad. Sci USA 79:6409). PCR mutagenesis will also find use for effecting the 
desired mutations. 

n ) RanHnm modifif «tinn of PKS/NRPS pathways. 

In another embodiment, variations can be made randomly, for example by 
making a library of molecular variants of a known polymer by randomly mutating one or 
more PKS or NRPS modules and/or enzymatic domains or by randomly replacing one or 
more modules or enzymatic domains in a known PKS or NRPS with a collection of 
alternative modules and/or enzymatic domains.. 

The PKS and/or NRPS modules can be combined into a single multi-modular 
enzyme, thereby dramatically increasing the number of possible combinations obtained using 
these methods. These combinations can be made using standard recombinant or nucleic acid 
amplification methods, for example by shuffling nucleic acid sequences encoding various 
modules or enzymatic domains to create novel arrangements of the sequences, analogous to 
DNA shuffling methods described in Crameri * al., (1998) Nature 391: 288-291, and in U.S. 
Patents 5,605,793 and in 5,837,458. In addition, novel combinations can be made in vitro, 
for example by combinatorial synthetic methods. Novel polymers, or polymer libraries, can 
be screened for any specific activity using standard methods. 

Random mutagenesis of the nucleotide sequences obtained as described above 
can be accomplished by several different techniques known in the art, such as by altering 
sequences within restriction endonuclease sites, inserting an oligonucleotide linker randomly 
into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating incorrect 
nucleotides during in vitro DNA synthesis, by error-prone PCR mutagenesis, by preparing 
synthetic mutants or by damaging plasmid DNA in vitro with chemicals. Chemical mutagens 
include, for example, sodium bisulfite, nitrous acid, hydroxyzine, agents which damage or 
remove bases thereby preventing normal base-pairing such as hydrazine or formic acid, 
analogues of nucleotide precursors such as nitrosoguanidine, 5-bromouracil, 2-aminopunne, 
or acridine intercalating agents such as proflavine, acriflavine, quinacrine, and the like. 
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Generally, plasmid DNA or DNA fragments are treated with chemicals, transformed into E. 
coli and propagated as a pool or library of mutant plasmids. 

Large populations of random enzyme variants can be constructed in vivo 
using "recombination-enhanced mutagenesis." This method employs two or more pools of, 
for example, 10 6 mutants each of the wild-type encoding nucleotide sequence that are 
generated using any convenient mutagenesis technique, described more fully above, and then 
inserted into cloning vectors. 

^ incorporation and /*"- mortification of non-hlm cluster elements. 

In either the directed or random approaches, nucleic acids encoding novel 
combinations of modules and/or enzymatic are introduced into a cell. In one embodiment, 
nucleic acids encoding one or more PKS or NRPS domains are introduced into a cell so as to 
replace one or more domains of an endogenous PKS or NRPS within a chromosome of the 
cell Endogenous gene replacement can be accomplished using standard methods, such as 
homologous recombination. Nucleic acids encoding an entire PKS, NRPS, or combination 
thereof can also be introduced into a cell so as to enable the cell to produce the novel 
enzyme, and, consequently, synthesize the novel polymer. In a preferred embodiment, such 
nucleic acids are introduced into the cell optionally along with a number of additional genes, 
together called a 'gene cluster,' that influence the expression of the genes, survival of the 
expressing cells, etc. In a particularly preferred embodiment, such cells do not have any 
other PKS- or NRPS- encoding genes or gene clusters, thereby allowing the straightforward 
isolation of the polymer synthesized by the genes introduced into the cell. 

Furthermore, the recombinant vectors) can include genes from a single PKS 
and/or NRPS gene cluster, or may comprise hybrid replacement PKS gene clusters with, e.g.. 
a gene for one cluster replaced by the corresponding gene from another gene cluster. For 
example, it has been found that ACPs are readily interchangeable among different synthases 
without an effect on product structure. Furthermore, a given KR can recognize and reduce 
polyketide chains of different chain lengths. Accordingly, these genes are freely 
interchangeable in the constructs described herein. Thus, the replacement clusters of the 
present invention can be derived from any combination of PKS and/or NRPS gene sets that 
ultimately function to produce an identifiable polyketide and/or peptide. 

Examples of hybrid replacement clusters include, but are not limited to, 
clusters with genes derived from two or more of the act gene cluster, the whiE gene cluster, 
frenolicin ifren), granaticin (gra), tetracenomycin (tern), 6-methylsalicylic acid (6-msas), 
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oxytetracycline {ate), tetracycline (/*). erythromycin <«y). griseusin (gris), nanaomycin, 
medermycin, daunorubicin, tylosin, carbomycin, spiramycin, avermectin, monensin, 
nonactin, curamycin, rifamycin and candicidin synthase gene clusters, among others. (For a 
discussion of various PKSs, see, e.g., Hopwood and Sherman (1990) Ann. Rev. Genet. 24: 
37-66; O'Hagan (1991) The Polyketide Metabolites, Ellis Horwood Limited. 

A number of hybrid gene clusters have been constructed, having components 
derived from the act.fren, tcm. gris and gra gene clusters {see. e.g., U.S. Patent 5,712,146). 
Other hybrid gene clusters, as described above, can easily be produced and screened using 
the disclosure herein, for the production of identifiable polyketides, polypeptides or 
polyketide/polypeptide hybrids. 

Host cells {e.g. Streptomyces) can be transformed with one or more vectors, 
collectively encoding a functional PKS/NRPS set {e.g. a bleomycin or bleomycin analog), or 
a cocktail comprising a random assortment of PKS and/or NRPS genes, modules, active 
sites, or portions thereof. The vectors) can include native or hybrid combinations of PKS 
and/or NRPS subunits or cocktail components, or mutants thereof. As explained above, the 
gene cluster need not correspond to the complete native gene cluster but need only encode 
the necessary PKS and/or NRPS components to catalyze the production of the desired 
product. For example, in Streptomyces aromatic PKSs, carbon chain assembly requires the 
products of three open reading frames (ORFs). ORF1 encodes a ketosynthase (KS) and an 
acyltransferase (AT) active site (KS/AT); ORF2 encodes a chain length determining factor 
(CLF), a protein similar to the ORF 1 product but lacking the KS and AT motifs; and ORF3 
encodes a discrete acyl carrier protein (ACP). Some gene clusters also code for a 
ketoreductase (KR) and a cyclase, involved in cyclization of the nascent polyketide 
backbone. However, it has been found that only the KS/AT, CLF, and ACP, need be present 
in order to produce an identifiable polyketide. Thus, in the case of aromatic PKSs derived 
from Streptomyces, these three genes, without the other components of the native clusters, 
can be included in one or more recombinant vectors, to constitute a "minimal" replacement 
PKS gene cluster. 

Variation of starter and exten der units. 

In addition to varying the PKS and/or NRPS modules and/or domains, 
variations in the products produced by various PKS/NRPS systems can be obtained by 
varying the starter units and/or the extender units. Thus, for example, a considerable degree 
of variability exists for starter units, e.g., acetyl CoA, maloamyl CoA, propionyl CoA, 
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acetate, butyrate, isobutyratc and the like. In addition, naturally occurring PKSs and/or 
NRPSs have shown some tolerance for varying extender units. 

V) Examples of orcferr pri modifications. 

As indicated above, the novel PKS and NRPS modules and enzymatic 
domains identified herein can be used to perform specific single modifications of particular 
substrates, or as components of complex synthetic pathways to generate particular products 
or large combinatorial libraries. As described in the Examples, a number of modules of the 
blm gene cluster provide novel functionality. By way of example, a few preferred reactions 
are listed below. These examples are intended to be illustrative and are not exhaustive nor 
limiting. 

1. Use »f BlmVIII PKS to introduce branche d methvl group. 

The blm VIII gene identified herein encodes a PKS module consisting of 
domains characteristic for known PKSs, such as ketoacyl synthase <KS). acyltransferase 
(AT), ketoreductase (KR), and ACP, with malonyl CoA acting as an extending unit. 
However, the identification of an integrated methyltransferase (MT) domain in the middle of 
Blm VIII is unique, representing the first PKS from actinomycetes that contains an internal 
MT domain. The use of this methyltransferase domain allows the introduction of a branched 
methyl group during a polyketide and/or polypeptide and/or hybriding 
polyketide/polypeptide synthesis. Figure 5 illustrates the use of BlmVIIIPKS in engineering 
a polyketide biosynthesis that introduces a branched methyl group. 

The first formula in Figure 5 illustrates a polyketide synthesis mediated by 6- 
deoxyerythronolide B synthase (DEBS) which normally catalyzes the biosynthesis of the 
erythromycin aglycone, 6-deoxyerythronolide B. The remaining formulas show how the use 
of the blmVIII methyltransferase (MT) group at different points in the synthesis results in the 
introduction of a methyl group at different locations in the resulting product. 

In view of this illustration, one of skill in the art would appreciate that the 
blmVIII MT domain can be used in a wide variety of biosyntheses to introduce methyl 
branches. 
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, lis* nf the y** rf"«ter t n nr'"» thiazoline. 
^u, M.>hl«oU '"">, Hith ia zoline.ann hithiazole-containinR 

compounds. 

The BlmlV and Blmlll NRPSs are characterized by unusual Cy domains as 
well an unprecedented Ox domain, providing an efficient biosynthesis for a bithiazole 
structure While thiazoline is the direct product of the Cy domain, the thiazoline-to-thiazole 
conversion generally is performed with an additional oxidation step. We identified at the C- 
terminus of NRPS-0 an additional domain that shows low, but significant, sequence 
homology to a family of putative oxidases/dehydrogenases, including the McbC protem of 
the microcin B17 synthase (Table 1). Microcin B17 synthase catalyzes the synthesis of the 
oxazole and thiazole-containing peptide antibiotic microcin B17, and McbC has been 
proposed to play a role in catalyzing the oxazoline/thiazoline-to-oxazole/thiazole conversion. 
Consequently, we propose that this extra domain at the C-terminus of NRPS-0 provides the 
oxidase/dehydrogenase activity for the biosynthesis of the bithiazole moiety ofBLM, 

1 5 defining a novel Ox domain for NRPSs. 

It is noteworthy that a cell-free preparation from Sv ATCC15003 has been 
reported to catalyze the conversion of phleomycins to BLMs in the presence of NAD + , 
supporting the hypothesis that the bithiazole moiety of BLM results from stepw,se 
oxidations of a bithiazoline precursor (Fig. 1 A). (The phleomycin producer could be 
imagined to result from the loss of its Ox activity for the first thiazoline ring.) Given the 
wide distribution of thiazole or oxazole rings in natural products exhibiting an unpress.ve 
array of biological activities, the cloning of the blmlV, III genes and the identification of the 
Ox domain open many opportunities thiazole biosynthesis and to synthesize novel thiazole 
containing molecules by engineering peptide biosynthesis. 

Representative thiazole syntheses using variants of the blm NRPS are 
illustrated in Figure 6. Note that in Figure 6, A M and A N refer to an A domain that activates 
and amino acid with R M and R N groups, respectively. A c refers to an A domain that 
activates Cys (x - SH) or Ser (X - OH) that can be cyclized to form the oxiaoline/tlnazoline 
or oxazole/thiazole structures. DH is a dehydratase. In view of these representative 
30 examples, one of skill in the art would appreciate that the blm NRPS domain and its variants 
can be used in a wide variety of chemical syntheses make thiazolidine, thiazoline, thiazole, 
bi-thiazolidine, bithiazoline, or bithiazole-containing compounds. 
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i Use of the Mm cene cluster «» heterocvr lie ring-containing 
compounds. 

Various blm modules can be used to produce heterocyclic ring-containing 
compounds. Such heterocycles include, but are not limited to five member S- and N- 
containg compounds of the thiazolidine, thiazoline and thiazole family or the 0- and N- 
containing compounds of the oxazolidine, oxazoline, and oxazole family. Again, the 
preparation of such compounds is illustrated in Figure 6. 

4. Use of the blm f ene cluster t n make sugars. 

In still another embodiment, the blm gene cluster or elements thereof can be 
used to make sugars. Such sugars include, but are not limited to L-sugars (with the BlmG 
epimerase), sugars modified by a carbamoyl group {e.g., using BlmD), and various 
disaccharides. Representative examples of such syntheses are illustrated in Figure 7. Such 
sugar biosynthesis genes can also e used to attach sugars onto other polyketide and/or 
peptide aglycones. 

F) Screening of products. 

Particularly where large combinatorial libraries are synthesized, e.g. using one 
or more modules and/or enzymatic domains of the blm gene cluster it will often be desired to 
screen the resulting compound(s) for the desired activity. Mehtods of screening compounds 
(e.g. polypeptides, polyketides, sugars, thiazoles, etc.) for various activities of interest (e.g. 
cytotoxicity, antimicrobial activity, particular chemical activities, etc.) are well known to 

those of skill in the art. 

Where large numbers of compounds are produced, it is often desired to 
rapidly screen such compounds using "high throughput systems" (HTS). High throughput 
assays systems are well known to those of skill in the art and many such systems are 
commercially available. e.g., Zymark Corp., Hopkinton, MA; Air Technical Industries, 
Mentor, OH; Beckman Instruments, Inc. Fullerton, CA; Precision Systems, Inc., Natick, 
MA, etc.). These systems typically automate entire procedures including all sample and 
reagent pipetting, liquid dispensing, timed incubations, and final readings of the microplate 
in detectors) appropriate for the assay. These configurable systems provide high 
throughputand rapid start up as well as a high degree of flexibility and customization. The 
manufacturers of such systems typically provide detailed protocols for the various high 
throughput screens. 
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VII. In Vitro syntheses. 

In additional embodiments of this invention, bleomycins and other 
polyketides and/or polypeptides are synthesized and/or modified in vitro. Individual 
enzymatic domains or modules can be used in vitro to modify a unit and/or to add a single 
m onomericunitto a growing P olyketideor P olype P tidechain.Inonea PP roacha 

metasynthetase providing all the desired synthetic activities recombinant* expressed and 
thenprovided.theappropriate substrates and buffer system in a bioreactor, to dl rect the 
synthesis of the desired product. In another approach, various PKSs and/or NRPSs are 
provided in different solutions and the growing polymer chains can be sequentially 
introduced into the plurality of solutions, each containing a single (or several) PKS or NRPS 
m0 dules in still another embodiment, the PKS and/or NRPS modules or enzymatic domains 
are provided attached to a solid support and a fluid contgaining the growing macromolecule 
is passed over the surface whereby the PKSs or NRPSs are able to react with the target 
substrate. 

In one preferred embodiment a combinatorial library of polykettdes or 
polypeptides, or combinations thereof, is created by using automated means to facilitate the 
sequential introduction of a multitude of polymeric Chains, each attached ,0 a solid support, 
,oacoUectionofsolutions,eachco„.ainingasinglePKSorNRPSmodu.e. These 

automated means can be used to systematically vary the sequence by which each poiymenc 
chain is introduced into the various solutions, thereby creating a combinatorial hbrary. 
Numerous methods are wel. known in the art to create combinatorial 
by the sequential addition of monomeric units, for example as described m WO 97/02358. 

VIII. Kits. 

In still another embodiment, this invention provides kits for practice of the 
methods described herein. In one preferred embodiment, the kits comprise one or more 
containers containing nucleic acids encoding one or more of the Mm gene cluster ORFs 
and/or one or moreof theBLMPKS orNRPS modules or enzymatic domains. Certam kns 
may comprise vectors encoding the blm orfs and/or cells containing such vectors. The kits 
optionally include any reagents and/or apparatus to facilitate practice of the assays 



may 



) described herein 



in. Such reagents include, but are not limited to buffers, labels, labeled 



antibodies, bioreactors, cells, etc. 

In addition, the kits may include instructional materials containing d.rections 
(/.,. protocols) for the practice of the methods of this invention. Preferred instruct.onal 
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materials provide protocols utilizing the kit contents for creating or modifying blm module or 
ORF and/or for synthesizing or modifying a molecule using one or more blm modules and/or 
enzymatic domains. While the instructional materials typically comprise written or printed 
materials they are not limited to such. Any medium capable of storing such instructions and 
communicating them to an end user is contemplated by this invention. Such media include, 
but are not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), 
optical media (e.g., CD ROM), and the like. Such media may include addresses to internet 
sites that provide such instructional materials. 

EXAMPLES 

The following examples are offered to illustrate, but not to limit the claimed 

invention. 

Example 1 

Bleomycin biosynthesis in Strep t nmvces verticillus ATCC15003, A model for hybrid 
pe ptide and polvketide biosynthesis. 
Here we report the cloning and characterization of the blm biosynthesis gene 
cluster from Sv ATCC 15003 (Fig. 2). Sequence analysis and biochemical characterization of 
individual modules enabled us to align the nine NRPS and one PKS modules in a linear order 
to constitute the Blm megasynthetase complex (Fig. IB). These studies revealed several 
unprecedented features for peptide and polyketide biosynthesis, setting the stage to 
investigate the molecular basis for intermodular communication between NRPS and PKS, 
and supported the wisdom of combining individual NRPS and PKS modules for 
combinatorial biosynthesis to make novel "unnatural" natural products from amino acids and 
short carboxylic acids. 

Materials and Methods. 

General procedures. 

Escherichia coli DH5a (Sambrook et al. (1989) Molecular Cloning: A 
Laboratory Manual, 2nd ed, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
USA), E. coli XL 1-Blue MR (Stratagene, La Jolla, CA), E. coli BL2 l(DE-3) (Novagen, 
Madison, WI), and Sv ATCC 15003 (American Type Culture Collection, Rockville, MD) 
were used in this work. pOJ446 (Agricultural Research Service Culture Collection, Peoria, 
IL), pQE60 (Qiagen, Santa Clarita, CA), pET28a and pET29a (Novagen), and other plasmids 
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were from commercial source, E. coli (Sambrook, supra.) and 5v ATCC 15003 strains 
(Hopwood * «/. (1985) GeneticManipulation ofStreptomyces: A Laboratory Manual The 
John Innes Foundation, Norwich, UK) were cultured under standard conditions. 

Plasmid preparation was carried out by using commercial kits (Qiagen). Total 
5 Sv ATCC15003 DNA was isolated according to literature protocols (Hopwood et al. (1985) 
Genetic Manipulation ofStreptomyces: A Laboratory Manual, The John Innes Foundation, 
Norwich UK;Nagaraja e , fl /. { mi) Methods Enzymol. 153: 166-198). Restriction enzymes 
and other molecular biology reagents were from commercial sources, and digest.ons and 
ligation followed standard methods (Sambrook, supra.). For Southern analysis, digox.gemn 
10 labelling of DNA probes, hybridization, and detection were performed accordmg to the 

protocols provided by the manufacturer (Boehringer Mannheim Biochemicals, Ind.anapohs, 

IN). 

Automated DNA sequencing was carried out on an ABI Prism 377 DNA 
Sequencer (Perkin-Elmer/ABI, Foster City, CA), and this service was provided by either the 
15 DBS Automated DNA Sequencing Facility, UC Davis, or Davis Sequencing (Davis, CA). 
Data were analyzed by the ABI Prism Sequencing 2.1.1 software and the Genetics Computer 
Group (GCG) program (Madison, WI). 

rinninp and ser"»™nP of the Mm gene cluster. 

A genomic library of Sv ATCC15003 was constructed in pOJ446 according to 

20 literature procedures (Nagaraja et al. (1987) Methods Entymol. 153: 166-198) and screened 
with probes made from both ends of the blmAB locus (Sugiyama et al. (1994) Gene 151:11- 
16- Calcutt and Schmidt (1994) Gene 151: 17-21), leading to the localization of 140-kb 
contiguous DNA, of which 100-kb is upstream (Fig. 2) and 40-kb is downstream (data not 
shown) of the blmAB genes. Heterologous NRPS probes were amplified from Sv 

25 ATCC15003 by polymerase chain reaction (PCR) according to literature procedures (Turgay 
and Marahiel (1994) Peptide Res. 7: 238-241) and used to screen the entire 140-kb DNA by 
Southern analysis under various hybridization conditions (Shen et al. (1999) Bioorg. Chem. 
27: 155-171). 

Prediction of snhstrate sp """"^ of NRPSs. 
30 The nine BlmNRPS modules were compared with eighty four modules from 

various bacterial and fungal NRPSs available at the GenBank, including those with known or 
putative specificity for amino acids present in BLM. A table of overall similarities/idenUt.es 
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was generated by PILEUP analysis of the A3 to A6 regions, and the residues lining the 
substrate binding pocket by comparison with PheA (Conti * al. (1997) EMBO J. 16, 4174- 
4183) were determined by PILEUP/PRETTY analysis. The percentage similarities for each 
Blm NRPS module were plotted against the rest of the NRPS modules to display the overall 
sequence homology between the A3 to A6 region. Those modules that showed significantly 
higher homology were selected to compare the amino acid residues that line the substrate 
binding pocket. 

nv.r prnrinc-tinn and b.nrh.mical charac f pri^tion of the NRPS-1A and NRPS- 
6A proteins. 

Heterologous expression of the A domain in E. coll were performed according 
to literature procedures (Mootz and Marahiel (1997) J. Bacterial. 179: 6843-6850). NRPS- 
1 A (forward primer 5'-AAC CCA TGG CTG CTT CCC TGA CCC GCC TGG CC3\ SEQ 
ID NO:76, and reverse primer 5'-CCT AGA TCT ACG GGC AGO TGG GGC GGT-3% 
SEQ ID NO:77) and NRPS-6A (forward primer 5'-GGG AAT TCC ATA TGA TCC TCA 
15 CGT CCT TCC AC-3\ SEQ ID NO:78, and reverse primer 5'-GGC AAG CTT GGG TGA 
GGG TCC GTT CGG T-3', SEQ ID NO:79) were amplified by PCR from Sv ATCC15003 
cosmid clones. The resulting 1.6-kb fragment of NRPS- 1 A was first cloned into the 
NcoVBgRl sites of P QE60 and then moved as an NcoUHindlll fragment into the similar sites 
of P ET29a to yield P BS10, and the resulting 1.6-kb fragment of NRPS-6A was directly 
20 cloned into the NdeVHindlll sites of pET28a to yield pBSl 1. Introduction of pBSlO and 
pBSl 1 into £ coll BL2l(DE-3) under standard expression conditions resulted in production 
of NRPS- 1 A (with an N-terminal S-tag and a C-terminal His 6 -tag) and NRPS-6A (with an N- 
terminal His 6 -tag), respectively. The soluble fractions effusion proteins were subjected 
sequentially to an affinity chromatography on Ni-NTA resin and an anion exchange 
25 chromatography on a Hyper-D column (PerSeptive Biosystem, Framingham, MA), resulting 
in NRPS- 1 A and NRPS-6A with near homogeneity. 
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Results and Discussion. 



Tinnin g of the blm oene cluster from Sv ATCC1S003. 

Davies and co-workers previously cloned two BLM resistance genes (blmA 
and blmB) from Sv ATCC15003 (Sugiyama et al. (1994) Gene 151:1 1-16), and Calcutt and 
Schmidt (1994) Gene, 151: 17-21, sequenced a 7.2-kb DNA fragment flanking the blmAB 
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genes, revealing seven open reading frames (orfs), none of which were found to encode Blm 
NRPS or PKS enzymes. Given the precedent that antibiotic production genes commonly 
occur as a cluster in actinomycetes, we adopted an approach combining chromosomal 
walking from the blmAB resistance locus and DNA hybridization with heterologous NRPS 
probes to clone and identify the blm cluster, leading to the localization of 140-kb contiguous 
Sv ATCC15003 DNA. DNA sequencing of approximately 90-kb of the blm gene cluster, 
including the 7.2-kb blmAB locus, revealed 40 ORFs (Fig. 2). Preliminary functional 
assignments were made by comparison of the deduced gene products with proteins of known 
functions in the database. Among the ORFs identified from the blm cluster, we indeed found 
a PKS module, flanked by several NRPS modules-a fact that supports the hybrid 
NRPS/PKS/NRPS hypothesis for BLM biosynthesis-along with several sugar biosynthesis 
genes and genes encoding other biosynthesis enzymes as well as several resistance and 

regulatory genes (Table 1). 

Noteworthy are the genes encoding the putative NRPS and PKS enzymes. 
The bind, blmll, and blmXI genes encode NRPSs with an unusual architecture. In contrast to 
all known NRPSs, which are of modular organization with each module consisting 
minimally of a condensation (C), an adenylation (A), and a peptidyl carrier protein (PCP) 
domain (1), Blml, Blmll, and BlmXI are discrete proteins homologous to individual domains 
of type I NRPSs. We have characterized Blml as a type II PCP (18). The Blmll and BlmXI 
proteins could serve as candidates for type II condensation enzymes. It is unclear yet what 
role if any these discrete NRPS enzymes could play in BLM biosynthesis. 

The blmlll blmlV. blmV. blmVI. blmVlI, blmlX, and blmX genes encode 
modular NRPSs consisting of domains characteristic for known type I NRPSs (A special 
thematic issue on polyketide and nonribosomal polypeptide biosynthesis, (1997) Chem. Rev. 
97: 2463-2706), such as the A, PCP, C, and condensation/cyclization (Cy) domains (Konz et 
al. (1997) Chem. Biol. 4: 927-937), as well as an unprecedented oxidation (Ox) domain (see 
discussion below). However, BlmVI is unique among all the Blm NRPSs identified. Its N- 
terminal module (NRPS-5) consists of an atypical A domain, which bears a close 
resemblance to a family of acyl CoA synthases (Fitzmaurice and Kolattukudy (1997) J. 
Bacterial. 179: 2608-2615; Fitzmaurice and Kolattukudy (1998) J. Biol. Chem. 273: 8033- 
8039), and an acyl carrier protein (ACP)-like domain (A special thematic issue on polyketide 
and nonribosomal polypeptide biosynthesis, (1997) Chem. Rev. 97: 2463-2706). Its C- 
terminal module is truncated and presumably interacts with BlmV to constitute the complete 
NRPS-3 module (Fig. IB). Also noteworthy are the C domain of NRPS-3 that lacks both 
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j „„ vnr /onn ID N0*4) active site for transpeptidation 
His residues of the conserved HHxxxDG (SEQ 1U " 

(Stachelhaus e, al (,998) / Bio,. CHen, 273: 22773-22781) and the extra C doma,n a, .he 
C-terminus of BlmV. These unusual features assorted with M.H-* B*Vn« play 
roles in the formation of the 0-aminoalaninamide and the pyridine ™«™° tB ^ 
which are unprecedented in peptide biosynthesis. For example, we propose that theNRPS- 
activated S.r is f.rs. dehydrated into dehydroalanine before condensation-an analogous 
Thr-.o-2,3-dehydroaminobu.yr.c acid dehydration has been observed in syringornycm 
biosynthesis (Guenzi « a,. (,998) / Bio,. CHe m . 273: 32857-32863). Conjugate addition to 
dehydroalanine by Asn on the NRPS-3 module downstream followed by an ammolysis o 
cleave the Ser-Asn adduc. off the B,m megasynthetase furnishes the P -aminoa,aninam,de 
moiety(Fig. IB). The former reaction could be catalyzed by the C domain of NRPS-3 that 
apparently is nonfunctional for norma, transpeptidation due to the lade of the active sites 
and the-latter reaction could be catalyzed by the acy, CoA synthase-.ike domam of NRPS-5 
in a process mat resembles the acyl CoA synthase-catalyzed synthesis of acy, CoA from 
carboxylic acid (Stachelhaus , a,. (1998) / Bio,. am. 273: 22773-2278,; Guenzt e, al. 
(1998) J. Biol. am. 273: 32857-32863) but in the reverse direction in the presence of an 

amino donor (Fig. IB). fOT .; ct ^ 
The blm VIII gene encodes a PKS module consisting of domains characteristic 
for known PKSs, such as ketoacyl synthase (KS), acyltransferase (AT,, ketor.duc.ase (KR>, 
and ACP, with malony. CoA acting as an extending unit according to sequence companson 
of the AT domain (Haydock e, al (1995, FEBSU, 374: 246-248) (Fig IB). However the 
identification of an integrated methyhransferase (MT) domain (Kagan and Clarke (.994, 
M Bb*m. BiopHys. 3.0: 4,7-427) in the middle ««mVm is unique, represent^ the 
firs. PKS from ac.inomyce.es .ha. con<ains an interna. MT domain. The only Cher example 

gene cluster (Pelludat e( a/- (1998)/ Bacterial. 180: 538-546). It has been assumed mat 
fungal PKSs in general contain internal MTs for the introduction of methyl branch into the 
oolyketid. products, as it has been shown recently in lovastaun biosynmesis (Kennedy « al 
(1999) Science 284: 1368-1372). 
3 ti„ Rim mepasV r ^'««»-*""lated assembly of BLM , 

According to the hybrid NRPS/PKS/NRPS model for BLM biosynthesis (Fig. 
1 A) we predict a linear modular organization of individual NRPS and PKS modules to 
constitute the Blm rnegasynthetase. Thus, the first functional domain of the Blm 
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megasynthetase should be a NRPS module that initiates BLM biosynthesis by activating L- 
Ser as an amino acylthioester to set the stage for transpeptidation. Chain elongation 
proceeds by sequential incorporation of L-Asn, L-Asn, L-His, and L-Ala, requiring four 
additional NRPS modules. In the next step, a malonate reacts with the resulting pentapeptide 
intermediate to form a p-ketothioester intermediate that is subsequently methylated at the ex- 
position and reduced at the p-keto group. A PKS module presumably dictates all these 
biosynthetic events and interacts with the aligned NRPS module upstream to channel the 
growing peptide intermediate from an NRPS module to a PKS module. After one cycle of 
polyketide elongation, peptide elongation is resumed by incorporation of an L-Thr residue. 
This step is presumably catalyzed by an NRPS module that interacts with the upstream PKS 
module to channel the growing polyketide intermediate (as far as the active site is concerned) 
from a PKS module to an NRPS module. At this stage, methylation occurs at the pyrimidine 
moiety of the growing intermediate, presumably catalyzed by a discrete methyltransferase; 
chain elongation is continued by three additional NRPS modules that incorporate a P-Ala 
4 and two L-Cys molecules sequentially. Finally, the fully assembled BLM 
peptide/polyketide/peptide backbone is hydroxylated at the p-position of the His residue, 
presumably by a discrete hydroxylase, and released from the Blm megasynthetase complex 
via nucleophilic substitution of the RCO-S-PCP species by a terminal amine to form the 
BLM aglycone. Intermediates after five of the nine proposed elongation steps were in fact 
isolated as P-3, P-3A, P-3K, P-4, P-5, P-5m, P-6m, and P-6mo (Takita and Muroka (1990) 
pages 289-309 in Biochemistry of Peptide Antibiotics: Recent Advances in the Biotechnology 
ofp-Lactams and Microbial Peptides, Kleinkauf, H. & von D6hren, H. eds., W. de Gruyter, 
N.Y.), which presumably resulted from premature departure from the Blm megasynthetase 
complex before the chain reaches its full length (Fig. IB). 

Most of the bacterial NRPS gene clusters characterized to date are organized 
in operon-type structures, encoding multimodular NRPS proteins with individual modules 
organized along the chromosome in a linear order that parallels the order of the amino acids 
in the resultant peptides, i.e., following the "colinearity rule" for the NRPS-templated 
assembly of peptides from amino acids (A special thematic issue on polyketide and 
nonribosomal polypeptide biosynthesis, (1997) Chem. Rev. 97: 2463-2706; Cane et al. 
(1998) Science 282: 63-68). Inspection of the blm gene cluster (Fig. 2) showed that the Blm 
NRPS and PKS modules apparently are not organized according to the "colinearity rule" for 
BLM biosynthesis (Fig. 1). [Exception to the "colinearity rule" was also noted in the 



49 



WO 00/40704 PCT/USOO/00445 
syringomycin synthetase gene cluster (Guenzi et al. (1998) J. Biol. Chem. 273: 32857- 
32863), and in fact, Grandi and co-workers have demonstrated recently in Bacillus subtilis 
that neither the operon-type structure nor the physical linkage of individual modules is 
essential for proper assembly and activity of the surfactin NRPS megasynthetase (Guenzi et 
al. (1998) J. Biol. Chem. 273: 14403-14410).] Realizing that the BLM biosynthesis cannot 
be rationalized according to the "colinearity rule", wc determined the substrate specificity of 
individual NRPS and PKS modules in an attempt to shed light on the modular organization 
of the Blm megasynthetase complex. Brick and co-workers postulated, based on the X-ray 
structural analysis of the A domain of GrsA, PheA, that the region between core sequences 
A3 to A6 represent the amino acid specificity determinant of an NRPS module (Conti et al. 
(1997) EMBO J. 16: 4174-4183). Since the A domains in all known NRPSs share a 
significant sequence identity (ensuring that the main chain conformation of the enzymes is 
likely to be very similar), they fiirther proposed that the differing substrate specificity of 
individual NRPS modules will be mainly determined by the nature of the amino acids lining 
the substrate binding pocket (Stachelhaus et al. (1999) Chem. Biol. 6: 493-505; Conti et al. 
(1997) EMBO J. 16: 4174-4183). Given this structural information and the vast amount of 
NRPS sequences available at the GenBank, we developed a novel approach for predicting 
substrate specificity for NRPS modules by comparing the overall sequence between the A3 
to A6 region and the eight amino acid residues that line up the substrate binding pocket. 
While a constant level of similarities (30%-40%) was evident among all the NRPS modules 
analyzed, most of the Blm NRPS modules showed striking similarities (50%-60%) to a 
particular cluster of NRPS modules as exemplified in Fig. 3A for NRPS-1 and NRPS-6. 
Close examination of these modules clustered with higher similarities revealed that they 
activate the same or very similar amino acid, based on which the putative substrate for the 
NRPS in query could be predicted, i.e., NRPS-1 and NRPS-6A activate L-Cys and L-Thr, 
respectively. These predictions were further supported by comparing the residues lining the 
substrate binding pocket. For example, the amino acid residues lining the substrate binding 
pocket for NRPS-1 and NRPS-6 are almost identical to those NRPS modules that are known 
to activate L-Cys and L-Thr, respectively, as shown in Fig. 3B. To verify the predicted 
amino acid specificity, we overproduced and purified the NRPS-1 A and NRPS-6A proteins 
(Fig. 3C) and examined their substrate specificity according to the amino acid-dependent 
ATP-PPi assay (Lee et al. (1970 Meth. Enzymoi, 43: 585-602; Ku et al. (1997) Chem. & 
Biol. 4: 203-207). NRPS-1 A and NRPS-6A indeed activate specifically L-Cys and L-Thr, 
respectively, among the amino acids tested (Fig. 3D). The latter results greatly enhanced our 
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confidence in predicting the substrate specificity of a NRPS module by the above method. 
Wc subsequently determined the substrate specificity for all the NRPS modules identified 
from the blm gene cluster and they in fact accounted for all nine amino acids required for 

BLM biosynthesis (Fig. 2). 

Using the substrate specificity of individual NRPS and PKS modules as a 
guide, we can align the nine NRPS and one PKS modules to constitute the Blm 
megasynthetase as shown in Fig. IB according to our hybrid NRPS/PKS/NRPS model for 
BLM biosynthesis (Fig. 1 A). Among all the PKSs or NRPS systems examined so far, the 
Blm megasynthetase consists of the largest number of individual proteins. The precise 
interactions among all the Blm NRPS and Blm PKS proteins to constitute the Blm 
megasynthetase complex, therefore, reflect a remarkable power of protein-protein 
recognition (Guenzi etal. (1998)7. Biol. Chem. 273: 14403-14410; Gokhale etal. (1999) 
Science 284: 482-485). Although we are yet to provide direct evidence supporting the 
specific protein-protein interactions between the neighboring proteins, it is striking to note 
that all the biosynthetic intermediates isolated are derailed from either PKS or NRPS 
modules at the junctions between the interacting proteins (Fig. IB). Since it is not difficult 
to imagine that an intermediate is more likely to fall off the enzyme complex when it is 
subjected to interpeptide transfer than to intrapeptide transfer, we view the latter observation 
as strong evidence supporting the current model of the Blm megasynthetase 

RlmTy/BlmVIII/BlmVII as a hybrid N KPS/PKS/NRPS model. \ 

Recent biosynthetic studies on rapamycin in Streptomyces hygroscopicus 
(Konig et al. (1997) Eur. J. Biochem. 247: 526-534), yersiniabactin in Yersinia 
enterocolitica and Y.pestis (Pelludat et al. (1998) J. Bacterial 180: 538-546; Gehring et al. 
(1998) Chem. Biol. 5: 573-586; Gehring et al. (1998) Biochemistry 37: 1 1637-1 1650) and 
TA in Myxococcus xanthus (Paitan et al. (1999) J. Mol. Biol. 286, 465-474) are starting to 
shed light on hybrid peptide and polyketide biosynthesis. Two models are emerging for the 
alignment between a NRPS and a PKS module. The interacting NRPS and PKS modules 
could be either covalently linked by arranging all domains in a linear order on the same 
protein (Pelludat et al. (1998)J. Bacterial. 180: 538-546; Gehring et al. (1998) Chem. Biol. 
5: 573-586; Gehring et al. (1998) Biochemistry 37: 1 1637-1 1650; Paitan et al. (1999) J. Mol. 
Biol. 286: 465-474) or physically located on two separate proteins, requiring specific protein- 
protein recognition to ensure the correct pairing between the interacting modules (Pelludat et 
al. (1998) J. Bacterial. 180: 538-546; Konig et al. (1997) Eur. J. Biochem. 247: 526-534; 
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Gehring et al. (1998) Chem. Biol. 5: 573-586; Gehring et al. (1998) Biochemistry 37: 1 1637- 
1 1650). Common to all these systems, however, are the unusual features associated with the 
interacting modules, such as the lack of the AT domain of the PKS module in Tal (Paitan et 
al. (1999) J. Mol. Biol. 286: 465-474) and the lack of the A domain and the presence of the 
Cy domain of the NRPS modules in both HMWP1 and HMWP2 (Pelludat et al. (1998) J. 
Bacteriol. 180: 538-5461; Gehring et al. (1998) Chem. Biol. 5: 573-586; Gehring et al. 
(1998) Biochemistry 37: 11637-1 1650). While extremely intriguing, the latter features 
complicate mechanistic analysis of these systems, making them less ideal candidates for 
studying how NRPS and PKS integrate into a productive hybrid NRPS/PKS complex. 

The BlmlX) 'BlmVIII/BlmVI! system combines the features of both hybrid 
NRPS/PKS and PKS/NRPS systems, serving as an ideal model for studying hybrid peptide 
and polyketide biosynthesis. The fact that both the BlmlX and Blm VII NRPS modules and 
the Blm VIII PKS module themselves are three separate proteins with a typical domain 
organization for NRPS and PKS enzymes greatly simplifies the mechanistic analysis of the 
hybrid NRPS/PKS/NRPS complex. We have found that the KS domain of BlmVIII is more 
similar to the KSs of HMWP1 (Pelludat et al. (1998) J. Bacteriol. 180: 538-546) and Tal 
(Paitan et al. (1999) /. Mol. Biol. 286: 465-474), both of which catalyze the elongation of a 
peptidyl intermediate with a malonate, than to KSs of type I PKSs. We attribute these subtle 
differences to their unique reactivity that catalyzes the transfer of the peptidyl intermediate 
from the PCP to the KS domain, which presumably takes place prior to chain elongation 
(Fig.4). Subsequent condensation catalyzed by the KS domain between the peptidyl 
intermediate and malonyl-S-ACP results in the elongation of the growing peptide with a 
carboxylic acid. Equally striking are the discoveries that the ACP domain of BlmVIII is 
more similar to a PCP than to an ACP and that the C domain of BlmVII has an additional N- 
terminal segment of about 50 amino acids that is rich in arginine, aspartic acid, and glutamic 
acid. The latter feature is analogous to the N-terminal intcrpolypeptide linker for type I PKS, 
which has recently been demonstrated to play a critical role in intermodular communication 
(Gokhale et al. (1999) Science 284: 482-485). We propose that these unique features of the 
ACP domain from the BlmVIII PKS module and the C domain-from the BlmVII NRPS 
module provide the molecular basis for the C domain to recognize the acyl-S-ACP as a 
substrate. Subsequent condensation catalyzed by the C domain between acyl-S-ACP and 
amino acyl-S-PCP results in the elongation of the growing polyketide (as far as this 
condensation is concerned) with an amino acid (Fig. 4). 
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NnvPl domains for the «lm NRPS and PKS modules. 

Various NRPS and PKS domains have been characterized, which are the 
building blocks for the entire field of combinatorial biosynthesis. The success for 
combinatorial biosynthesis depends critically upon the repertoire of these individual 
domains. Genetic analysis of the blm gene cluster has uncovered several novel NRPS and 
PKS domains. Without being bound to a particular theory, it is believed that BlmVI and 
BlmVns involved in the biosynthesis of the p-aminoalaninamide and pyrimidine moieties of 
BLM). In addition, the MT domain in BlmVIII, the Cy domains in BlmlV, and the Ox 
domain in Blmlll are novel domains. 

The BlmVIII PKS module apparently furnishes the "propionate" unit into 
BLM in two steps by evolving a malonyl CoA-specifying AT domain coupled with a novel 
S-adenosylmethionine-requiring MT domain, representing a new mechanism to introduce 
methyl branches into polyketides (Fig. 4). This biosynthetic reaction sequence is 
unprecedented for polyketide biosynthesis since all PKSs from actinomycetes examined to 
date incorporate the alkyl branches into the resultant polyketides by selecting various alkyl 
malonates as the extending units that are determined by the AT domains. Yet, feeding 
experiments have unambiguously established that the polyketide moiety of BLM was 
derived from an acetate and a methionine (Takita and Muroka (1990) pages 289-309 in 
Biochemistry of Peptide Antibiotics: Recent Advances in the Biotechnology oj ^Lactams and 
Microbial Peptides, Kleinkauf, H. & von Dohren, H. eds., W. de Gruyter, N.Y.), a fact that 
fits well with the observed unusual domain organization of the BlmVIII PKS module (Fig. 
4). It is conceivable that the combination of this MT domain with an AT domain specific for 
a methyl malonate extending unit (Haydock et al. (1995) FEBS Lett. 374: 246-248) could 
result in the synthesis of polyketides with a gem-dimethyl moiety via engineering polyketide 
biosynthesis. Such a gem-dimethyl group has been found to be a very important 
pharmacophore for the epothilones, a family of hybrid peptide and polyketide metabolites 
that exhibits a remarkable antitumor activity similar to taxol (Ojima et alo. (1999) Proc. 

Natl. Acad. Sci. USA 96: 4256-4261). 

The BlmlV and Blmlll NRPSs are characterized by the unusual Cy domains 
as well as the unprecedented Ox domain, providing an efficient biosynthesis for a bithiazole 
structure. The Cy domain was first defined by Marahiel and co-workers in their study of 
bacitracin biosynthesis in B. licheniformis (Konz et al. (1997) Chem. Biol. 4: 927-937), and 
the Cy activity was demonstrated recently by Walsh and co-workers in their study of the 
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HMWP1 and HMWP2 proteins for yersiniabactin biosynthesis in Y. pestis (Gehring et al. 
(1998) Chem. Biol. 5: 573-586; Gehring et al. (1998) Biochemistry 37: 1 1637-1 1650). 
While thiazoline is the direct product of the Cy domain, the thiazoline-to-thiazole conversion 
requires an additional oxidation step. We identified at the C-terminus of NRPS-0 an 
5 additional domain that shows low, but significant, sequence homology to a family of putative 
oxidases/dehydrogenases, including the McbC protein of the microcin B17 synthase (Table 
1 ). Microcin B 1 7 synthase catalyzes the synthesis of the oxazole and thiazole-containing 
peptide antibiotic microcin B17, and McbC has been proposed to play a role in catalyzing the 
oxazoline/thiazoline-to-oxazole/thiazole conversion (Li et al. (1996) Science 274: 1 188- 

10 1 193; Milne, et al. (1999) Biochemistry 38: 4768-4781). Consequently, we propose that this 
extra domain at the C-terminus of NRPS-0 could provide the oxidase/dehydrogenase activity 
needed for the biosynthesis of the bithiazole moiety of BLM, defining a novel Ox domain for 
NRPSs. It is noteworthy that a cell-free preparation from Sv ATCC 15003 has been reported 
to catalyze the conversion of phleomycins to BLMs in the presence of NAD + (Takita and 

1 5 Muroka ( 1 990) pages 289-309 in Biochemistry of Peptide Antibiotics: Recent Advances in 
the Biotechnology of p-lactams and Microbial Peptides, Kleinkauf, H. & von D6hren, H. 
eds., W. de Gruyter, N.Y.), supporting the hypothesis that the bithiazole moiety of BLM 
results from stepwise oxidations of a bithiazoline precursor (Fig. 1A). (The phleomycin 
producer could be imagined to result from the loss of its Ox activity for the first thiazoline 

20 ring.) Given the wide distribution of thiazole or oxazole rings in natural products (Ojima et 
alo. (1999) Proc. Natl. Acad. Sci. USA 96: 4256-4261; Li etal. (1996) Science 274: 1188- 
1 193) exhibiting an impressive array of biological activities, the cloning of the blmlVJIl 
genes and the identification of the Ox domain open many opportunities to define the 
mechanism for thiazole biosynthesis and to potentially synthesize novel thiazole containing 

25 molecules by engineering peptide biosynthesis. 
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Example 2 

Identification and characte r i^inn of a type II ncntidvl carrier protein from the 
hlpnm vcin producer Streotomy ces verticillus ATCC 15003. 



Results. 



Tinning and sequence analysis of th e blml gene 

In our effort to clone the gene cluster responsible for BLM biosynthesis, we 
have determined 80 kb DNA sequence from Sv ATCC15003 (Fig. 8). Among the orfs 
identified within the blm gene cluster is the small orf of 273 base pairs (bp), blml, which is 
located approximately 4 kb upstream of the previously characterized blmAB resistance locus 
(Sugiyamae/a/. (1994) Gene 151: 11-16; Calcutt and Schmidt (1994) Gene 151: 17-21) 
(Fig. 8B). The blml gene encodes a protein of 90 amino acids with a molecular weight of 
9957 and a pi of 6.52 (Fig. 8C). Computer-assisted analysis (Altschul et al. (1997) Nucleic 
Acids Res. 25: 3389-3402) of the deduced amino acid sequence indicates that Blml is very 
similar to various PCP domains of NRPSs (ranging around 40% identity and 60% similarity, 
as shown in Figure 9). Like known PCP domains of NRPS, Blml has the highly conserved 
signature motif of LGGXS, within which the serine residue is the site for 4'- 
phosphopantetheinylation (Stachelhaus and Marahiel (1995) FEMS Microbiol. Lett. 125: 3- 
14; Marahiel et al. (1997) Chem. Rev. 97: 2651-2673). The latter posttranslational 
modification is generally necessary for peptide biosynthesis; converting the apo-PCP into the 
functional holo-PCP (Marahiel et al. (1997) Chem. Rev. 97: 2651-2673; Walsh et al. (1997) 
Curr. Opin. Chem. Biol. 1: 309-315). Based on sequence comparison, Blml is most related 
to PCPs and not to other kinds of carrier proteins that also share the same LGGXS (SEQ ID 
NO:80) motif and undergo the same posttranslational 4'-phosphopantetheinylation [3 1], such 
as the E. coli acyl carrier protein (ACP) (Lambalot and Walsh (1995) J. Biol. Chem. 270: 
24658-24661), the ACP domain of type I PKS and the type II PKS ACP (Cox and Simpson 
(1997) FEBS Lett. 405: 267-272; Carrcras et al. (1997) Biochemistry 36: 1 1757-1 1761), the 
ArCP domain (Gehring et al. (1998) Biochemistry 37: 2648-2659), and several nodulation 
related ACP-like proteins (Epple et al. (1998) J. Bacteriol. 180: 4950-4954; Spaink et al. 
(1991) Nature 354: 125-130). 
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Overexprcssion of hlml in E. coli 

To overexpress the blml gene in E. coli, we directly amplified the blml gene 
by PGR from the Sv. ATCC15003 genomic DNA and cloned it into the pQE-60 expression 
vector to give P BS1 so that Blml could be produced as a protein with a native N-termmus 
5 and a His 6 -tag at its C-terminus. However, no production of the Blml protein was detected, 
as judged by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), upon 
introduction of P BS1 into E. coli M15(pREP4) under the standard overexpression conditions 
recommended by the manufacturer (Qiagen). We reasoned that the small Blml protein wUh 
its native N-terminus may not be stable in the heterologous host, and hence moved the blml 
10 gene from pBSl into P ET-29a to yield the second overexpression construct of P BS2. In the 
bitter construct, Blml should be produced as a fusion protein with 27 extra ammo acd 
residues at its N-terminus, including an S-tag and the thrombin cleaving site, in addmon to 
the His 6 -tag at its C-terminus. Introduction of P BS2 into E. coli BL21(DE-3) under the 
standard overexpression conditions recommended by the manufacturer (Novagen) mdeed 
15 resulted in overproduction of Blml. In fact, the bulk of the soluble protein was the 

overproduced Blml, which was easily purified by affinity chromatography using Ni-NTA 
resin (Qiagen). It is noteworthy that fusion of the additional 23 amino acids to the N- 
terminus of Blml as in P BS2 and change of the expression system from E. coli M15(pREP4) 
(pBSl) to E. coli BL21(DE-3)( P BS2) dramatically improved the expression level of blml. 

20 m vivn 4'-nho S pb "r™t P iheinvlation " f the BlmI pr0tei "- 

To establish Blml as a type II PCP, we tested if it could serve as a substrate 
for a PCP-specific 4'- PPTase. PPTases catalyze the posttranslational modification of an 
apo-PCP into a holo-PCP by transferring the 4'-phosphopantetheine moiety from co-enzyme 
A (CoA) to the conserved serine residue of PCP, and this reaction has been developed 

25 recently into a general method to prepare various holo-PCP, holo-ACP, or holo-ArCP from 
the corresponding apoproteins (Stachelhaus et al. (1996) Chem. Biol. 3: 913-9211; Gehring et 
al (1998) Biochemistry 37: 2648-2659; Gehring et al. (1998) Biochemistry 37: 1 1637- 
1 1650- Weinreb et al. (1998) Biochemistry 37: 1575-1584 ). Therefore, we deeded to 
investigate the 4'-pho S phopantetheinylation of Blml under both in vivo (Ku et al. (1997) 

30 Chem Biol 4: 203-207) and in vitro (Gehring et al. (1998) Biochemistry 37: 1 1637-1 1650; 
Lambalot et al. (1996) Chem. Biol. 3: 923-936; Quadri et al. (1998) Biochemistry 37: 1585- 
1 595) conditions. 
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To examine 4 , -phosphopantetheinylation of Blml in vivo, we chose E. coli 
OG7001 as the expression host, which is a P-alanine auxotroph derived from E. coli 
BL21(DE3) by PI co-transduction of the panD mutation from E. coli SJ16 (Epple et al. 
(1998) J. Bacterial. 180: 4950-4954). Upon introduction of pBS2 into E. co/i OG7001, blml 
was exceptionally well expressed and the overproduced Blml protein was readily purified. 
However, high performance liquid chromatography (HPLC) analysis showed that the 
purified Blml was essentially in the apo-form (Fig. 10A), indicative that apo-Blml was a 
poor substrate for the E. coli endogenous PPTases, such as EntD and ACP synthase 
(Lambalot et al. (1996) Chem. Biol. 3: 923-936; Walsh et al. (1997) Curr. Opin. Chem. Biol. 
1 : 309-3 15; Lambalot and Walsh (1995) J. Biol. Chem. 270: 24658-24661). To circumvent 
the poor endogenous PPTase activity, we next co-expressed blml with the gsp gene, which 
was isolated from the gramicidin S producer Bacillus brevis, and encoded a PPTase that was 
known to 4'-phosphopantetheinylate heterologously produced PCPs in E. coli (Lambalot et 
al. (1996) Chem. Biol. 3: 923-936; Ku et al. (1997) Chem. Biol. 4: 203-207). We co- 
transformed pDPT-Gsp, in which the expression of the gsp gene was under the control of the 
T5/Lac promoter (Ku et al. (1997) Chem. Biol. 4: 203-207), and P BS2 into E. coli OG7001. 
Blml was again very well expressed and the resulting Blml protein was similarly purified. 
HPLC analysis showed that at least 60% of overproduced Blml was modified into the holo- 
Blml protein (Fig. 10B). (A PCP domain was similarly 4'-phosphopantetheinylated in vivo 
before by co-expressing gsp in E. coli using pDPT-Gsp, and approximately 80% of the PCP 
was produced in the holo-form (Ku et al. (1997) Chem. Biol. 4: 203-207). 

We next cultured E. coli OG7001(pBS2) and E. coli OG7001(pBS2/pDPT- 
Gsp) in the presence of [3- 3 H]-P-alanine, a known biosynthetic precursor of 4'- 
phosphopantetheine (Stachelhaus et al. (1996) Chem. Biol. 3: 913-921 ; Epple et al. (1998) J. 
Bacterial. 180:4950-4954). Specific incorporation of [3- 3 H]-P-alanine into the 4'- 
phosphopantetheine moiety of holo-Blml was determined by autoradiographic analysis. 
Thus, while fermentation of E. coli OG700l(pBS2) in the presence of [3- 3 H]-P-alanine led 
to an IPTG-dependent overproduction of Blml, little of the resulting Blml protein was 3 H- 
labeled, indicative of being produced in the apo-form. In contrast, fermentation ofE. coli 
OG700l(pBS2/pDPT-Gsp) in the presence of [3- 3 H]-p-alanine resulted in a significant 
increase of IPTG-dependent incorporation of the 3 H-label into the overproduced Blml 
protein, suggesting a specific incorporation of [3- 3 H]-P-alanine into holo-Blml, presumably 
in the 4'-phosphopanthetheine moiety. There were several additional proteins that were also 
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weakly labeled by [3- 3 H]-p-alanine. However, both their expression and their incorporation 
by 3 H-label were independent from either IPTG induction or the presence of Gsp, hence 
these proteins were unrelated to Blml. (Similar background labeling was reported before for 
in vivo 4 , -phosphopanthetheinylation of other PCP (Epple et al. (1998) J. Bacteriol. 1 80: 
4950-4954)). We also purified the Blml protein from E. coli OG7001(pBS2/pDPT-Gsp) and 
demonstrated that it was the holo-Blml protein that was specifically associated with the 3 H- 
activity. Finally, we confirmed the identity of holo-Blml by subjecting the purified Blml 
protein to MALDI-Tof mass spectral analysis (Weinreb et al. (1998) Biochemistry 37: 1575- 
1584). Blml produced in the absence of the Gsp PPTase yielded a single peak with a 
molecular weight of 13,952, suggesting that the produced Blml protein is in the apo-form 
(calc, 13,949). In contrast, Blml produced in the presence of Gsp yielded two species with 
molecular weight of 13,969 and 14,303, respectively. While the species with the molecular 
weight of 13,969 represents apo-Blml, a molecular weight of 14,303 unambiguously 
confirmed the other protein as holo-Blml (calc, 14,289). The latter result indicated that the 
purified Blml consisted of both the apo- and holo-Blml proteins, in agreement with the 
HPLC analysis results (Fig. 10B). 

In vitro 4 , -phosphopantetheinvlation of the Blml protein 

To investigate 4'-phosphopantetheinylation of Blml in vitro, we chose the Sfp 
protein as the preferred PPTase, which had been isolated before from the surfactin producer 
Bacillus subtilis (Nakano et al. (1992) Mol. Gen. Genet. 232: 313-321). (Overexpression of 
gsp in E. coli using pDPT-Gsp resulted in predominantly an insoluble Gsp protein (Ku et al. 

(1997) Chem. Biol. 4: 203-207). The Sfp PPTase was overproduced in E. coli 

MV1 190(pUC8-Sfp) and purified to near homogeneity as described before (Quadri et al. 

(1998) Biochem., 37: 1585-1595; Nakano et al. (1992) Mol. Gen. Genet., 232: 313-321). 
Upon incubation of the purified apo-Blml with [ 3 H-pantetheine]-CoA in the presence of the 
Sfp PPTase, we examined the covalent incorporation of the [ 3 H-pantetheine]-4'- 
phosphopantetheine moiety from CoA into holo-Blml by autoradiographic analysis. Indeed, 
the apo-Blml was quantitatively labeled by [ 3 H-pantetheine]-CoA, and no labeling was 
observed in the absence of either the apo-Blml or the Sfp PPTase protein, demonstrating that 
the Sfp PPTase can recognize apo-Blml as a substrate and specifically transfer the 4'- 
phosphopantetheine group from CoA into holo-Blml. 
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In vitro aminoacvlation of Blml 

Once we established Blml as a type II PCP.that can be readily modified by 
PCP-specific PPTases into the holo-Blml protein, we tested if the holo-Blml could be 
aminoacylated in trans, requiring an A domain. Since Blml has no cognate A domain of its 
own, we turned our attention to another putative biosynthesis gene cluster we have cloned 
previously from Sv ATCC15003, which encodes at least four NRPS and one PKS modules. 
We have established that this gene cluster is not clustered with the blm locus and is unrelated 
to BLM biosynthesis. From this gene cluster, we amplified by PCR a 1579 bp fragment 
encoding an A domain, named Val-A, which we predicted to have a molecular weight of 
56,581 and a pi of 7.39. We cloned val-A into pET-28a to yield pBS3, in which Val-A 
would be produced as a fusion protein with a His 6 -tag at the N-terminus. Introduction of 
P BS3 into E. coli BL21(DE3) under the standard overexpression conditions recommended 
by the manufacturer (Novagen) resulted in good overproduction of Val-A, predominantly in 
soluble form, from which Val-A was purified by affinity chromatography using Ni-NTA 
resin. The purified Val-A protein was active by the amino acid-dependent ATP-PPi 
exchange assay (Lee and Lipmann (1970) Method Emzymol. 43: 585-602; Ku et al. (1997) 
Chem. Biol., 4: 203-207). Among the 23 amino acids tested, Val-A specifically activated 
valine, an amino acid that is not required for BLM biosynthesis. 

To carry out the aminoacylation in trans, we incubated the purified holo-Blml 
and Val-A in vitro in the presence Z-[ 14 C(U)]valine and ATP (Stachelhaus et al. (1996) 
Chem. Biol. 3: 913-921; Weinreb etal. (1998) Biochemistry 37: 1575-1584). The 
aminoacylated holo-BlmI-I-[ H C(U)]valine species was subjected to SDS-PAGE and specific 
attachment of L-[ 14 C(U)]valine to holo-Blml was determined by autoradiographic analysis. 
Remarkably, the holo-Blml was specifically labeled by Z,-[ I4 C(U)]valine in the presence of 
Val-A, indicative of the formation of the holo-Blml-S-valine thioester. The in trans 
aminoacylation between the holo-Blml and Val-A proteins appeared to be very specific. 
Neither incubation of I-[ 14 C(U)] valine with Val-A, the apo-Blml, or the holo-Blml protein 
alone, nor incubation of Z-[ 14 C(U)]valine with the Val-A and apo-Blml proteins, resulted in 
the detection of l4 C-labeled Blml protein. 

Discussion. 

Nonribosomal peptides and polyketides are two distinct classes of natural 
products yet are assembled from amino acids and short carboxylic acids by NRPSs and 
PKSs, respectively, in strikingly similar strategies (Cane et al. (1998) Science 282: 63-68). 
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These fascinating multifunctional enzyme complexes have been classified into two types 
based on their gene organization and enzyme architecture. Type I enzymes are 
multifunctional proteins consisting of domains for individual enzyme activities, and type II 
enzymes are multienzyme complexes consisting of discrete proteins that are largely 
monoftmctional. While both type I and type II PKSs (Fig. HAand 11C) have been well 
characterized to account for the vast structural diversities found in polyketide biosynthesis 
(Hopwood (1997) Chem Rev. 97: 2465-2497), all NRPSs studied so far are exclusively the 
type I modular enzymes (Fig. 1 IB) (Kleinkauf and von Dohren: H. (1996) Eur. J. Biochem. 
236: 335-351; Marahiel etal. (1997) Chem. Rev. 97: 2651-2673; von Dohren et al. (1997) 
Chem Rev 97:2675-2705). It is very tempting to speculate the existence of a type II NRPS 
that analogous to type II PKS (Shen and Hutchinson (1993) Science 262: 1535-1540; Bao et 
al. (1998) Biochemistry 37: 8132-8138; Carreras and Khosla (1998) Biochemistry 37: 2084- 
2088) should consist of discrete proteins possessing enzyme activities such as the A 
(Stachlhaus and Marahiel (1995)7. Biol. Chem. 270: 6163-6169), the PCP (Stein and Morris 

(1996) J. Biol. Chem. 271 : 15428-15435), or the C (Stachlhaus al. (1998) J. Biol. Chem. 
273- 22773-22781) domains of type I NRPSs (Fig. 1 ID). The fact that both the A 
(Stachlhaus and Marahiel (1995) J. Biol. Chem. 270: 6163-6169; Konz et al. (1997) Chem. 
Biol. 4: 927-937; Weinreb et al. (1998) Biochemistry 37: 1575-1584; Mootz and Maraud 

(1997) J. Bacterial. 179: 6843-6850) and the PCP (Stachelhaus et al. (1996) Chem. Biol. 3: 
913-921 • Weinreb et al. (1998) Biochemistry 37: 1575-15841; Pfeifer et al. (1995) 
Biochemistry 34: 7450-7459; Haese et al. (1994) /. Mol. Biol. 243: 1 16-122; Lambalot et al. 
(1996) Chem. Biol. 3: 923-936; Quadri et al. (1998) Biochemistry 37: 1585-1595; Gehring et 
al. (1996) Chem. Biol. 4: 17-24; Ku et al. (1997) Chem. Biol. 4: 203-207) domains of type I 
NRPSs can act as independent enzymes supports the hypothesis of a type II NRPS. 

We have now cloned and sequenced the blml gene, overproduced and 
characterized the Blml protein as a bona fide type II PCP, and demonstrated that holo-Blml 
can be aminoacylated by a completely unrelated A domain, providing for the first time 
genetic and biochemical evidence for a type II NRPS enzyme. We concluded Blml as a type 
II PCP based on the following criteria. (1) The deduced amino acid sequence of the blml 
gene is highly homologous to various PCP domains of known NRPSs, in particular at the 
signature motif of LGGXS within which the 4'-phosphopantetheine prosthetic group is 
covalently attached to the serine residue (Marahiel et al. (1997) Chem. Rev. 97: 2651-2673; 
Stachelhaus and Marahiel (1995) FEMS Microbiol. Lett. 125: 3-14). While the current 
boundaries for a PCP domain in the literature were defined arbitrarily (Stachelhaus et al. 
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(1996) Chem. Biol. 3: 9 13-921) and varied from one PCP to another, we can now re-define a 
PCP domain for the type I NRPS as a 90 amino acid peptide with approximately 45 amino 
acids, each flanking the essential serine residue in the LGGXS (SEQ ID NO:81) motif, in 
light of this discrete Blml type II PCP (Fig.9). (2) The blml gene has been successfully 
expressed in E. coli, and fusion of a short peptide to the N-terminus of Blml dramatically 
improved its overproduction efficiency. While we cannot exclude the effect of different 
systems on gene expression, i.e., E. coli M15(pREP4)(pBSl) vs. E. coli BL21(DE-3)(pBS2), 
we attribute the increase in expression efficiency to the stability of Blml as an N-terminal 
fusion protein instead of the otherwise labile Blml protein with its native N-terminus. Since 
Blml was produced predominantly in the apo-form in E. coli, apo-Blml apparently was not a 
substrate for the endogenous PPTases, such as EntD or ACP synthase, excluding Blml as an 
ArCP or ACP, respectively. EntD and ACP synthase are known to 4'- 
phosphopantetheinylate apo-ArCP and ACP, respectively, to their holo-forms efficiently 
(Lambalot et al. (1996) Chem. Biol. 3: 923-936; Walsh ei al (1997) Curr. Opin. Chem. Biol. 
1 : 309-3 15; Lambalot and Walsh (1995) J. Biol. Chem. 270: 24658-24661). (3) The apo- 
Blml protein serves as a substrate for PCP-specific PPTases that transfer the 4'- 
phosphopantetheine moiety from CoA to apo-Blml to yield the holo-Blml protein. We have 
demonstrated this posttranslational modification for Blml in vivo with the Gsp PPTase (Ku 
et al. (1997) Chem. Biol. 4: 203-207) and in vitro with the Sfp PPTase (Gehring et al. (1998) 
Biochemistry 37: 1 1637-1 1650; Lambalot et al. (1996) Chem. Biol. 3: 923-936; Quadri et al. 
(1998) Biochemistry 37: 1585-1595), both of which have been extensively used in preparing 
holo-PCPs. (4) The specific modification of apo-Blml by 4'-phosphopantethcinylation has 
been monitored by HPLC analysis (Fig. 10) (Weinreb et al. (1998) Biochemistry 37: 1575- 
1584) and by specific incorporation of [3- 3 H]-p-alanine in vivo (Stachelhaus et al. (1996) 
Chem. Biol. 3: 913-921; Ku et al. (1997) Chem. Biol. 4: 203-207; Epple et al. (1998) J. 
Bacteriol. 180: 4950-4954) and of [ 3 H-pantetheine]-CoA in vitro (Gehring et al. (1998) 
Biochemistry 37: 1 1637-1 1650; Lambalot et al. (1996) Chem. Biol. 3: 923-936; Quadri et al. 
(1998) Biochemistry 37: 1585-1595), respectively, into the 4'-phosphopantetheine moiety of 
the holo-Blml protein. The identity of Blml was finally confirmed by MALDI-Tof mass 
spectral analysis that determined the molecular weight for both the apo- and holo-Blml 
proteins. 

While individual domains of type I NRPSs can function independently and 
several A (Stachlhaus and Marahiel (1995) J. Biol. Chem. 270: 6163-6169; Konz et al. 
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(1997) Chem. Biol. 4: 927-937; Weinreb et al (1998) Biochemistry 37: 1575-1584; Mootz 
and Marahiel (1997) J. Bacterial 179: 6843-6850) and PCP (Stachelhaus et al (1996) 
Chem. Biol. 3: 913-921 ; Weinreb et al. (1998) Biochemistry 37: 1575-15841; Pfeifer et al. 
(1995) Biochemistry 34: 7450-7459; Haese et al. (1994) J. Mol. Biol 243: 1 16-122; 
Lambalot et al (1 996) Chem. Biol 3: 923-936; Quadri et al. (1998) Biochemistry 37: 1 585- 
1595; Gehring et al. (1996) Chem. Biol. 4: 17-24; Ku et al. (1997) Chem. Biol 4: 203-207) 
domains have been overproduced, purified, and biochemically characterized, aminoacylation 
in trans has been successful only between PCPs and their cognate A domains (Stachelhaus et 
al (1996) Chem. Biol 3: 913-921; Weinreb et al (1998) Biochemistry 37: 1575-1584). No 
aminoacylation between PCP and A domains from different NRPS modules has been 
observed. These results led to the conclusion that there is a specific protein-protein 
recognition between the A domain and its cognate PCP (Weinreb et al. (1998) Biochemistry 
37: 1575-1584). Such domain-specific aminoacylation, in fact, should be beneficial in 
maintaining the fidelity of a type I NRPS by providing additional "gating" against 
misincorporation of non-specifically activated aminoacyl adenylate into the final peptide 
product. Since a type II PCP such as Blml lacks its cognate A domain, we asked if Blml 
could be aminoacylated by an unrelated A domain of a type I NRPS. Although we have yet 
to determine the biochemical role of Blml in vivo, the fact that the blml gene is located in the 
middle of the blm gene cluster suggests that it may be involved in BLM biosynthesis. To 
avoid the ambiguity of selecting an A domain that may potentially interact with Blml in 
vivo, we preferred not to choose any A domain from the blm gene cluster to test if it could 
aminoacylate Blml in trans. We reasoned that an A domain that is unrelated to Blml should 
come from a gene cluster independent from BLM biosynthesis and should activate an amino 
acid not required by BLM. We chose Val-A because it satisfied both requirements. Val-A is 
an A domain of a type I NRPS from a gene cluster we have cloned previously from Sv 
ATCC15003 that has proven to be unrelated to BLM biosynthesis, and it specifically 
activates valine among the 23 amino acids tested. Remarkably, Blml was efficiently 
aminoacylated by Val-A. The valine residue is specifically attached in a thioester linkage to 
the terminal -SH of the 4'-phosphopantetheine moiety of the holo-Blml protein, as evidenced 
by the fact that the apo-Blml was inactive under the identical conditions. 

Aminoacylation of holo-Blml by Val-A represents the first example in which 
an A domain aminoacylates a protein other than its cognate PCP domain. Since it has been 
suggested that an A domain of a type I NRPS can transfer the activated aminoacyl adenylate 
onlv to its cognate PCP domain because of the specific protein-protein recognition between 
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the two domains (Weinreb el al. (1998) Biochemistry 37: 1575-1584), the fact that Blml is 
aminoacylated by Val-A revealed a distinct feature of a type II PCP. It is very tempting to 
speculate that type II PCPs such as Blml may have broad intrinsic substrate specificity 
toward either the aminoacyl adenylate, the A domain, or both. In fact, the latter feature is 
reminiscent of the type II PKS ACPs, which have been shown to be interchangeable among 
different PKS complexes (Shen and Hutchinson (1993) Science 262: 1535-1540; Bao et al. 
(1998) Biochemistry 37: 8132-8138; Carreras and Khosla (1998) Biochemistry 37: 2084- 
2088). The biosynthesis of £>-alanyl-lipoteichoic acid in Bacillus suntillis (Perego et al. 
(1995) J. Biol. Chem. 270: 15598-15606) and Lactobacillus casei (Debabov et al. (1996) 
178: 3869-3876) also involves a discrete ACP-like protein, the £>-alanyl carrier protein, 
although the latter clearly is structurally and functionally different from PCPs. 

The results strongly suggest the existence of a type II NRPS. In fact, we have 
already identified within the blm gene cluster two additional genes, Wm//and Wm*/(Fig. 
IB), which encode type II C proteins based on sequence analysis {see Example 1). 

Sipnificance. 

All NRPSs known to date are exclusively the type I modular enzymes that are 
multifunctional proteins consisting of domains, such as A (Stachlhaus and Marahiel (1995) J. 
Biol. Chem. 270: 6163-6169), PCP (Stachelhaus et al. (1996) Chem. Biol. 3: 913-921), and C 
(Stachlhaus et al. (1998) J. Biol. Chem. 273: 22773-22781), for individual enzyme activities 
(Kteinkauf and von D6hren: H. (1996) Eur. J. Biochem. 236: 335-351; Marahiel et al. (1997) 
Chem. Rev. 97: 2651-2673; von Dohren et al. (1997) Chem. Rev. 97: 2675-2705), and 
control the structural variations of the resulting peptide products by the multiple-carrier 
thiotcmplate mechanism (Cane et al. (1998) Science 282: 63-68; Stein and Morris (1996) J. 
Biol. Chem. 271: 15428-15435). While individual domains of type I NRPSs can function 
independently, aminoacylation in trans has been successful only between PCPs and their 
cognate A domains (Stachelhaus et al. (1996) Chem. Biol. 3: 913-921; Weinreb et al. (1998) 
Biochemistry 37: 1575-1584). We have cloned and sequenced the blml gene, overproduced 
and characterized the Blml protein as a bona fide type II PCP, and demonstrated that the 
holo-Blml can be aminoacylated by a completely unrelated A domain. Our results provided 
for the first time the genetic and biochemical evidence to support the hypothesis of a type II 
NRPS, setting the stage for formulating new research concepts to study peptide biosynthesis. 
Genetic manipulation of type I NRPS has already been successful in generating novel 
peptides (Stachlhaus et al. (1995) Science 269: 69-72). An unprecedented type II NRPS 
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should shed new light in engineering NRPS proteins, greatly increasing our ability to access 
peptides with even greater structural diversities. 

Materials and methods 

General DNA manipulations 

Plasmids preparation and DNA extraction were carried out by using 
commercial kits (Qiagen, Santa Clarita, CA), and all other manipulations were carried out 
according to standard methods (Sambrook et al (1989) Molecular cloning: a laboratory 
manual: (2nd ed): Cold Spring Harbor Laboratory Press: Cold Spring Harbor: USA). E. coli 
strain DH5a was used as the host for general DNA propagations. 

nvprpy pression of blml in E. coli and purification of the Blml protein 

The W/H/gene was amplified from Sv ATCC15003 by PCR using a forward 
primer of 5'-CCG CCCATGGGT GCT CCG CGT GGC GAG CGG ACC CGG CGC-3' 
(SEQ ID NO:82, the Ncol site is underlined) and a reverse primer of 3'-CCT AGATCT 
CCG GTC CCG CTC CCC CGT-5' (SEQ ID NO:83, the Bglll site is underlined). In order 
to create the Ncol site, the original starting sequence of "ATG AGC" has been changed to 
"ATG GGT", which resulted in the change of the second amino acid from serine to glycine. 
The First five codons of blml were also optimized for overexpression in E. coli. The PCR- 
amplified 0.3 kb Ncol-Bglll fragment was cloned into the similar sites of pQE-60 (Qiagen) 
to form pBSl. Digestion of pBSl with Ncol and Hindlll and cloning the resulting 0.3 kb 
Ncol-Hindm fragment into the same sites of pET-29a (Novagen, Madison, WI) yielded 
pBS2. 

Expressions of blmlmE. coliU\5 (pREP4)(pBSl) and in£. coli BL-21(DE- 
3)(pBS2) and purification of the resulting Blml protein by affinity chromatography on Ni- 
NTA resin were carried out under the standard conditions recommended by Qiagen and 
Novagen, respectively. The incubation temperature was lowered to 30 °C to improve the 
solubility. The purification of Blml was monitored by SDS-PAGE on 15% gel. The final 
pure Blml protein was desalted on PD-10 column (Sephadex G-25, Pharmacia Biotech, 
Piscataway, NJ) into 50 mM sodium phosphate buffer, pH 7.8, containing 200 mM NaCl, 10 
mM MgCl 2 , 2 mM dithiothreitol (DTT), 1 mM EDTA, 10% glycerol, and stored at - 80 °C 
for in vitro assays. 
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HPi r analysis and M At .OI-Tof ma? c ■p* rtral determination 

Samples of Blml (30-70 ug) purified from E. coli OG7001(pBS2) or E. coli 
OG7001(pBS2/pDPT-Gsp) were analyzed on a Nova-Pak C18 column (5mm x 10, Waters, 
Milford, MA) using a Rainin DMAX HPLC unit. The column was developed by a linear 
gradient of 0-50% acetonitrile in 0.1% trifluoroacetic acid in 25 min, followed by additional 
5 min at 50 % acetonitrile, with a flow rate of 0.6 ml/min and detection at 280 nm. MALDI- 
Tof mass spectral determination was performed on a Bruker Biflex IIII spectrometer at the 
Facility for Advanced Instrumentation of University of California, Davis. 

in vivn laheling of Rlml with [3- 3 H1-B-alanine 

The p-alanine auxotroph E. coli strain OG7001 (Epple et al. (1998) J. 
Bacteriol. 180: 4950-4954) was transformed with pBS2 and cultured under the same 
conditions as for E. coli BL21(DE3) (Novagen). For co-expression of blml with gsp, pDPT- 
Gsp (Ku et al. (1997) Chem. Biol. 4: 203-207) was similarly transformed into E. coli 
OG7001(pBS2) and the transformants were cultured in 2xYT (Debabov et al. (1996) 178: 
3869-3876) in the presence of kanamycin (25 ug/ml) and chloramphenicol (50 ug/ml). For 
in vivo labeling experiment, cells from 2 ml overnight culture of either E. coli 
OG7001(pBS2) or E. coli OG7001(pBS2/pDPT-Gsp) were harvested, washed with M9 
minimal medium (Debabov et al. (1996) 178: 3869-3876), and re-suspended in 2 ml of M9 
minimal medium. The latter were used as seed cultures (20 ul) to inoculate 1 ml M9 
medium with kanamycin (25 ug/ml) or kanamycin (25 ug/ml) and chloramphenicol (50 
ug/ml) for£. coli OG7001(pBS2) oris, coli OG7001(pBS2/pDPT-Gsp), respectively. The 
resulting culture was incubated at 30 °C, 250 rpm to OD 6 oonm 0.6 and to this was added 10 
uCi of [3- 3 H]-p-alanine (50 Ci/mmol, American Radiolabeled Chmicals Inc., St. Louis, MO) 
with or without IPTG (1 mM). Total proteins were resolved by SDS-PAGE on 15% gels 
that were Coomassie blue-stained. To determine 3 H-labeling of the overproduced holo-Blml 
protein, gels were soaked in Amplifier (Amersham, Arlington Heights, II) for 20 min, dried 
between two sheets of cellulose membrane (KOH Development Inc., Ann Arbor, MI), and 
visualized by autoradiography on X-ray films (Fuji Medical Systems, Stamford, CT). 

Tn vitro labeling of Rlml with r 3 H-pantetheinel-CoA 

Expression of sfp in E. coli MV1 190(pUC8-Sfp), purification of the Sfp 
PPTase to homogeneity, and 4 , -phosphopantetheinylation of apo-Blml by Sfp in vitro were 
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carried out essentially according to literature procedures (Quadri et al. (1998) Biochemist^ 
37: 1585-1595; Nakanoe/a/. (1992)A*>/. Gen. Genet.232: 313-321). Atypical 100 jxl 
assay solution contained 26 uM apo-Blml, 2.9 »U Sf P , 25 »M ['H-pantetheine]-CoA (0.9 
uCi, 40 Ci/mM), 10 mM MgCl 2 , and 5 mM DTT, in 75 mM MES/NaOAc buffer, P H 6.0. 
After 30 min incubation at 37 °C, the assays were stopped by addition of 5 ul of bovine 
serum albumin (0.2 mg/ml) and 0.9 ml of cold 10% (v/v) trichloroacetic acid (TCA). The 
precipitated proteins were collected by centrifugation at 14,000 rpm, 20 min, 4 °C 
(Eppendorf 5415C centrifuge), washed with 10% TCA three times, and resolved by SDS- 
PAGE on 15% gel. The 3 H-activity incorporated into holo-Blml was similarly determined 
by autoradiography as described for in vivo labeling of holo-Blm with [3- 3 H]-P-alanine. 

^..^v p^ccinn nf val-A in F rnli and purifirfltion nnrt assay of the Val-A 
protein 

The val-A fragment was amplified from Sv ATCC1 5003 by PCR using a 
forward primer of 5'-GGA ATT C CA TAT G GG CAC CAC CGT CGC CGC G-3' (SEQ ID 
NO.S4, the Ndel site is underlined), and a reverse primer of 3'-GGC AAG CTT GGG ACC 
GGG CGT GGA GCG C (SEQ ID NO:85, the Hindlll site is underlined). The PCR- 
amplified 1.6 kb Ndel-HindlU fragment was cloned in the similar sites of pET-28a (Qiagen) 
to yield pBS3. Expression of val-A in E. coli BL-21(DE-3)( P BS3) and purification of the 
resulting Val-A protein by affinity chromatography on Ni-NTA resin were carried out under 
the standard conditions recommended by Novagen. 

Amino acid-de P endent ATP-PPi assays were performed essentially according 
to the literature procedures (Ku et al. (1997) Chem. Biol. 4: 203-207; Lee and Lipmann 
(1970) Method Emzymol. 43: 585-602). A typical 100 pi assay solution contained 180 nM 
Val-A, 1 mM ATP, 0.1 mM PPi with 0.2 uCi of 32 P-PPi (11.75 Ci/mmol, NEN Life Science 
Products, Inc., Boston, MA), 1 mM MgCl 2 , 0.1 mM EDTA, and 1 mM L-amino acid in 50 
mM sodium phosphate buffer, P H 7.8. After 30 min incubation at 30°C, the assays were 
stopped by addition of 0.9 ml of cold 1% (w/v) activated charcoal in 3% (v/v) perchloric 
acid. The precipitates were collected on glass fiber filters (2.4 cm, G-4, Fisher, Pittsburgh, 
PA), washed successively with 10 ml of 0.2 M sodium phosphate buffer, pH 8.0, 4 ml water, 
and'l ml of ethanol, and dried in air. The filters were mixed with 7 ml of scintillation fluid 
(ScintiSafe Gel, Fisher) and counted on a Beckman LS-6800 scintillation counter to 
determine the radioactivity. 
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In vitro aminoacvlation 0 f hnin.Blml hv Val-A 

The aminoacylation of holo-Blml was carried out essentially according to 
literature methods (Stachelhaus et al. (1996) Chen, Biol. 3: 913-921; Weinreb et al. (1998) 
Biochemistry 37: 1575-1584). Atypical 100 ul assay solution contained 180nMVal-A, 1.5- 
5 2.8 uM apo- or holo-Blml, 35 uM I-[ ,4 C(U)]-valine (283 mCi/mmol, NEN Life Science 
Products, Inc., Boston, MA), 5 mM ATP, 10 mM MgCl 2 , and 5 mM DTT in 75 mM Tris- 
HC1 buffer, P H 8.0. The reactions were started by the addition of ATP and, after incubafcon 
at 37 »C for 30 min, were stopped by addition of 0.9 ml of cold 7% (v/v) TCA. The 
precipitated proteins were collected by centrifugation at 14,000 rpm, 20 min, 4 °C 
0 (Eppendorf 5415C centrifuge) and resolved by SDS-PAGE on a 15% gel. The radioactivity 
incorporated into the holo-BlmI-L-[ 14 C(U)]valine species was similarly determined by 
autoradiography as described for in vivo labeling of holo-Blml with [3- 3 H]-P-alanine. 

Example 3: 

^„ ; „ r onH .haracterizat '™ nf , phnsnhopantetheinyl transferase from the 
l5 hlPomvcin-pro f Wing StreDtom V ">< vrrticillus ATCC15003 

Multienzymes complexes exist for acyl group activation and transfer reactions 
in the biogenesis of fatty acids, the polyketide family of natural products (e.g. erythromycin, 
tetracycline), and almost all non-ribosomal peptides (e.g. vancomycin, cyclosporin, 
penicillin). All of these complexes contain one or more small proteins, -80-100 amino acids 
20 long, either as separate subunits or as integrated domains, that function as carrier proteins for 
the growing acyl chain (acyl-, peptidyl-, and aryl- carrier proteins, abbreviated as ACP, PCP, 
and ArCP). They are converted from inactive apo-forms to functional holo-forms by the 
covalent attachment of the 4'-phosphopantetheine moiety of coenzyme A to a conserved 
serine residue of the carrier-protein substrate. This essential post-translational modification 
25 is catalyzed by a family of enzymes known as phosphopantetheinyl transferases (PPTases) 
(Lambalotetal. Chem. Biol. (1996) 3:923-936; Walsh etal. Curr. Opin. Chem. Biol. 

(1997) 1:309-315). 

Research in the field of polyketide and non-ribosomal peptide biosynthesis 
has been hampered by the inability to fully modify and thus convert to the active form some 
30 polyketide synthases (PKS) and polypeptide synthetases (NRPS) when overproduced in 
heterologous hosts, presumably because the host PPTases are unable to effectively modify 
these overexpressed protein substrates. Our group is currently involved in the 
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characterization of the gene cluster responsible for the biosynthesis of the antitumor drug 
bleomycin in Streptomyces verticillus ATCC15003. As bleomycin synthetase is a hybrid 
NRPS/PKS enzyme, we decided to obtain a PPTase from the producing organism in order to 
use it in vitro or in vivo by coexpression with the synthetase genes to produce properly 
modified, active synthetases for our studies. 

Results and Discussion 

Pinnin g of the DttA pene from S. verticillus ATCC15003. 

The similarities among PPTases from different organisms are reduced to two 
short motifs separated by 40-45 residues: (V/I)G(V/I)D, and (FAV)(S/C/T)XKE(A/S)hhK 
(Lambalotetal. Chem. Biol. (1996) 3:923-936; Walsh etal. Curr. Opin. Chem. Biol. 
(1997) 1:309-3 15). Our previous attempts to amplify PPTase sequences from S. verticillus 
chromosomal DNA using degenerate primers according to the two conserved motifs were 
unsuccessful (unpublished results), so we decided to narrow our target. PPTases have been 
classified in two groups, according to their specificity for the carrier-protein substrate: 
PPTases involved in polyketide/fatty acid biosynthesis use acyl carrier proteins (ACPs) as 
substrate, while those for non-ribosomal peptide biosynthesis use peptidyl carrier proteins 
(PCPs) or aryl carrier proteins (ArCPs) (Walsh etal. Curr. Opin. Chem. Biol. (1997) 
1 :309-3 15). Several "NRPS-type" PPTase sequences were used to screen the databases to 
look for actinomycete homologues, and four proteins of unknown function were found: 
NshC from Streptomyces actuosus (Li et al. Gene (1990) 91:9-17), SC5A7. 23 from S. 
coelicolor (GenBank AL03 1 107), an unnamed protein from Streptomyces sp. strain TH1 
(Mori et al. J. Bacterial. (1997) 179:5677-5683), and Rv2794c (later renamed PptT 
(Quadrietal. Chem. Biol. (1998) 5:631-645)) from Mycobacterium tuberculosis (GenBank 
AL008967). The alignment of the actinomycete sequences showed the two motifs conserved 
in all PPTases and an additional motif - the "THC" motif: PXWPXGX 2 GS(M/L)THCXGY 
(SEQ ID NO:86), located about 15 amino acids upstream of the (V/I)G(V/1)D motif (SEQ ID 
NO:87). The "THC" motif is not universally conserved in all PPTases, but it can be detected 
also in some non-actinomycete PPTases like EntD(Coderre etal. J. Gen. Microbiol. 
(1989) 135:3043-3055). Using a recently developed method of PCR primer design (the 
CODEHOP strategy (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) (Rose et al. 
Nucleic Acids Res. (1 998) 26: 1 628-1635), two primers were designed around the typical C- 
terminal PPTase motif (primers KEA-1: 5'-T GCA GCA GAA CAG GAG GCKNYC CCA 
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NKG-3' (SEQ ID NO:88) and KEA-2: 5'-TG GGT CAG CGG GTA CCA NRC YTT RWA- 
3' (SEQ ID NO: 89, H=C+A, N=A + C + T + G, Y=C+T, K=G+T, R=A+G, W=T + A)), and one 
primer was designed from the "THC" motif (primer THC: 5'-C GGC ATG GTC GGC TCC 
HTN ACN CAY TG-3\ SEQ ID NO:90, H=C+A, N=A+C+T+G, Y=C+T, K=G+T, 
R=A+G) W=T+A); this motif is not universally conserved in PPTases of all organisms). 
Using S. verticillus chromosomal DNA as template, no amplification product was detected 
using the THC and the KEA-1 primers. The set of primers THC/KEA-2 successfully 
amplified a single band of the expected size (about 250 bp), which was gel-purified and 
cloned. Eight individual clones were sequenced, and all of them resulted to be identical 
(except differences due to primer utilization) and highly similar to the putative actinomycete 
PPTases. The PCR fragment was used as a probe to screen a S. verticillus genomic library 
by colony hybridization. Of the 10,000 colonies screened, 25 positive clones were 
identified, and then confirmed by Southern analysis to contain the same 4. 6-kb BamHl 
hybridizing band. The 4. 6-kb DNA fragment was subcloned, and the nucleotide sequence 
of a 1,761 -bp Bamm-Sah region was determined (SEQ ID NO. 3). 

S cience analv "« "f *he P"tA locus. 

The sequence of the 1,761-bp BamHl-Saa fragment was analyzed for coding 
regions by using the CODONPREFERENCE and TESTCODE programs of the GCG 
package (Genetics Computer Group, Madison, Wisconsin). Two complete ORFs (pptA, 
orJ3) and two incomplete ORFs {orfl. or/4) were identified within the sequenced region 
(Figure 13). The first ORF from left to right (designated orfl) starts out of the analyzed area 
and ends with a TGA codon at position 248 of the sequenced fragment. Comparison of the 
deduced product of orfl with proteins in databases showed similarities with Rv2795c from 
Mycobacterium tuberculosis (GenBank AL008967) and SC5A7. 22 from S. coelicolor 
(GenBankAL031107),bothofunknownfunction. The second ORF, pptA, contains the 
sequence amplified by PCR and used for the cloning of this locus. It comprises 741 
nucleotides, starting with a GTG codon (position 245) which is coupled to the stop codon of 
orfl, and ending with a TAA codon. The starting codon of pptA is preceded by a potential 
ribosomal binding site (RBS), GGGAG. The overall (76. 6%) and third codon position (93. 
9%) G+C contents and the codon usage of pptA are similar to those found in other 
Streptomyces genes, with the exception of the stop codon (TAA), which is most uncommon 
in this group of organisms (Wright etal. Gene(l992) 113:55-65). The pptA gene encodes a 
protein of 246 amino acids with a predicted molecular mass of 25,619 Da and a pi of 4. 76, 
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which contains the conserved PPTase motifs. Databases searches with PptA showed 
significant similarities to the putative actinomycete PPTases (39-52%/48-61% 
identity/similarity) and to confirmed bacterial PPTases such as EntD from E. coli 
(17%/24% identity/similarity) (Lambalot et al. Chem. Biol. (1996)3:923-936). The third 
ORF, or/3, is separated from pptA by an apparently noncoding DNA region of 153 bp, and it 
is transcribed in opposite and convergent direction with respect to orfl-pptA. The gene or/3 
comprises 240 nucleotides, starting with an ATG codon (position 1358) and ending with 
TGA. The starting codon of or/3 is preceded by the sequence GAAGG, a potential RBS. 
The deduced product of or/3 encodes a protein of 79 amino acids with a predicted mass of 
7,555 Da and a pi of 7. 17. The Orf3 protein shows similarities to the N-terminal region of 
SC5H1 . 35c, a protein of unknown function from S. coelicolor (GenBank AL049863). 
Analysis of Orf3 with the SignalP program (Nielsen etal. Protein Engineer. (1997) 10:1-6) 
predicts an N-terminal signal peptide which would be cleaved between residues 27 and 28 
(ALA-DS), suggesting that the mature protein (52 amino acids, 5,099 Da, pi 4. 31) would be 
secreted. Between orJ3 and or/4 there is an apparently noncoding region of 25 1 nucleotides. 
The orf4 gene is transcribed in opposite and divergent direction with respect to or/3. It starts 
with an ATG codon at position 1610, preceded by a potential RBS (GGAGG), and ends out 
of the sequenced fragment. The deduced protein product (50 amino acids) of the incomplete 
or/4 contains a potential NAD/FAD binding motif, GXGX 2 GX 3 GX 6 G (Scrutton et al. 
Nature (1990) 343:38-43), showing low similarities to diverse oxidoreductases. 

Heterologous expression and biochemica l characterization of PptA. 

In order to test xipptA actually encodes a functional PPTase, we decided to 
overproduce and purify the PptA protein, and assay its catalytic competence on putative 
substrate proteins or domains. The pptA coding sequence was amplified by PCR and cloned 
into the T5-promoter-based pQE-70 vector, yielding plasmid pQEPPT, in such a way that a 
hexahistidine tag would be added at the C-terminus of the protein. Expression of the 
pQEPPT construct in E. coli M15(pREP4) resulted in the overproduction of soluble His- 
tagged PptA which was readily purified by affinity chromatography on Ni-NTA agarose 
under non-denaturing conditions (FIGURE). Because pptA belongs, by sequence similarity, 
to the subfamily of PPTases involved in nonribosomal peptide synthesis, we first assayed its 
activity using two different apo-PCPs as protein substrates. The first one, Blml, has been 
previously characterized in our laboratory as a discrete peptidyl carrier protein, or type II 
PCP, whose gene is found within the bleomycin-biosynthesis gene cluster of S. verticillus 
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(Duetal. Chem. Biol. (1999) 6:507-517). For the second PCP substrate we used BlmX, a 
bimodular NRPS protein encoded in the same cluster (Fig.. 2), as a source of a type I PCP, ». ■ 
e a PCP included in a multidomain NRPS. For the production of this type I PCP, we 
amplified by PCR a 1,898 bp fragment encoding the adenylate and PCP domains from the 
second module of BlmX. This DNA fragment was cloned into P MAL-c2x to yield 
PMAL1617, in which the type I PCP would be produced as a maltose-binding protein (MBP) 
fusion, MBlmX-2, with a predicted molecular mass of 108. 5 kDa. Introduction of 
pMAL1617 in* coli TBI resulted in good overproduction of MBlmX-2, about 40% 
soluble which was purified by affinity chromatography using amylose resin. To test the 
PPTase activity, we incubated the purified PptA with Blml and MBlmX-2 as putative protem 
substrates in the presence of (>H)-(pantetheinyl)-CoASH, and the tritiated products were 
subjected to SDS electrophoresis and autoradiography. The well-characterized PPTase Sfp 
from B. subtilis, which exhibits a broad specificity for its protein substrate (Quadri et al. 
Biochemistry (1998) 37: 1585-1595), was included as a positive control. In these 
experiments PptA exhibited a robust phosphopantetheinylation activity on both Blml and 
MBlmX-2. Having demonstrated that PptA does in fact have PPTase activity on both type I 
and type II PCP substrates from nonribosomal peptide synthetases, we then proceeded to test 
two different acyl-carrier proteins (ACPs) as potential substrates. The first one, BlmVIII, is 
a monomodular multidomain polyketide synthase (PKS) which is encoded in the bleomycm- 
biosynthesis gene cluster of S. verticillus (Fig. 2). BlmVIII contains an ACP domain 
C-terminus, that is a type I ACP. For the second ACP substrate we used TcmM, a type II 
acyl carrier protein involved in the biosynthesis of the aromatic polyketide tetracenomyen C 
in S. g W*^(Shenetal. J. Bacteriol. (1992) 174:3818-3821; Bao etal. Biochemistry 
(1998) 37: 8132-8138). For the production of TcmM, its coding sequence was transferred 
from a construct previously made in pET-22b (Gehring etal. Chem. Biol. (1997)4:17-24) 
into the pET-28a vector to yield pET28a-TcmM, in such a way that a hexahistidine tag 
should be added at both the N-terminus and the C-terminus of the protein. Plasmid P ET28a- 
TcmM was introduced into E. coli BL21(DE3), and TcmM was easily purified by affinity 
chromatography using Ni-NTA resin. In vitro phosphopantetheinylation assays were 
performed as before, but using BlmVIII and TcmM as protein substrates, and PptA was able 
to posttranslationally modified both ACP substrates. 
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THp p ptA gene is not Hnstered to th e hleomvcin-hiosynthesis locus. 

Some bacterial PPTase genes have been found clustered, or close, to their 
respective "partner" NRPS genes: en/Z> {enterobactin (Coderre et al. J. Gen. Microbiol. 
(1989) 135:3043-3055)}, sfp {surfactin (Cosmina et al. Mol. Microbiol. (1993) 8:821- 
831)},^ {gramicidin (Borchertetal. J. Bacteriol. (1994) 176:2458-2462)}, bli 
{bacitracin (Gaidenko ct al. Biotechnologia (1992) 13-19)}, lpa-14 {iturin (Huang et al. J. 
Ferment. Bioeng. (1993)76:445-450)}. To test the possible clustering of pptA to the 
bleomycin-biosynthesis (blm) locus, PCR reactions were performed using the THC/KEA-2 
primers on several overlapping cosmid clones spanning the blm locus plus 30-40 kb 
upstream and downstream of its putative limits. No amplification product could be obtained 
in these reactions, showing that ihepptA gene is not clustered with the blm locus. 

Discussion 

It has been suggested that in organisms containing multiple 
phosphopantetheine-requiring pathways, each pathway has its own posttranslational 
modifying activity (Walsh etal. Curr. Opin. Chem. Biol. (1997) 1:309-315). Ourgroup 
has found that S. verticillus ATCC15003 contains several PKS and NRPS gene clusters, one 
of them being responsible for bleomycin production (a hybrid NRPS/PKS system) (Shen et 
al. Bioorg. Chem. (1999) 27:155-171; Du et al. Chem. Biol. (1999)6:507-517). This 
suggested that the gene encoding the PPTase for the BLM NRPS could be also clustered, or 
close, to the NRPS genes. However, we have not found this gene after sequencing almost 
the whole blm NRPS locus. Because having this gene could be important for us in order to 
express functional NRPS modules from the blm cluster, we decided to clone the PPTase 
gene. Additionally, if the "one NRPS cluster - one PPTase" hypothesis was true, it seemed 
possible to use PPTase sequences as a new kind of probe to clone novel NRPS clusters. 

We know that in S. verticillus there are several NRPS locus (maybe four), so 
we expected several "PCP-type" PPTases. However we have amplified only one, and it does 
not seem to be closely linked to any of the NRPS loci. Interestingly in the actinomycete 
Mycobacterium tuberculosis, whose genome is fully sequenced, there is only one PCP-type 
PPTase gene, which is not clustered with any of the two NRPS loci present in this organism 
(Quadrietal.Ctem. Biol. (1998)5:631-645). These and other indirect evidences suggest 
that the idea of cluster-specific PPTases is not the general rule at all but most probably the 
exception, especially in organisms containing multiple NRPS clusters. And there are strong 
evidences that at least some PCP-type PPTases can posttranslationally modify PCPs from 
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different clusters and even different organisms (Quadri et al, Chem. Biol. (1998) 5:63 1-645; 
Gehring et al, Biochemistry (MX) 37:1 1637-1 1650). It is most likely that there is only one 
PCP-type PPTase in S. verticillus and that its gene is not necessarily clustered to any of the 
NRPS loci. 

Biochemical characterization of the purified PptA protein confirmed not only 
its PPTase activity but also its broad specificity, comparable to that of Sfp. Different apo- 
PCPs (type I and type II) and a type-I apo-ACP from the bleomycin synthetase, and the type- 
II apo-ACP from the tetracenomycin PKS of Streptomyces glaucescens were efficiently used 
as substrates by PptA. These results suggest PptA as a good candidate for heterologous 
coexpression with NRPS and PKS genes to overproduce active holo-synthase enzymes. 

It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included within the spirit and purview of 
this application and scope of the appended claims. All publications, patents, and patent 
applications cited herein are hereby incorporated by reference in their entirety for all 
purposes. 
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CLAIMS 

What is claimed is: 

1. An isolated nucleic acid comprising a nucleic acid selected from the 
group consisting of 

a nucleic acid encoding any one of Blm open reading frames (ORFs) 8 

through 41; 

a nucleic acid encoding a polypeptide encoded by any one of Blm 

open reading frames (ORFs) 8 through 41; and 

a nucleic acid amplified by polymerase chain reaction (PCR) using 
any one of the primer pairs identified in Table II and the nucleic acid of a bleomycin- 
producing organism as a template. 

2 The isolated nucleic acid of claim 1, wherein said nucleic acid 
comprises a nucleic acid encoding at least two open reading frames selected from the group 
consisting of Blm open reading frames 8 through 41 . 

3 The isolated nucleic acid of claim 1, wherein said nucleic acid 
comprises a nucleic acid encoding at least three open reading frames selected from the group 
consisting of Blm open reading frames 8 through 41 . 

4. The isolated nucleic acid of claim 1, wherein said nucleic acid 
comprises a nucleic acid encoding a C domain lacking one or more His residues of the 
conserved HHxxxDG active site for transpeptidation. 

5 The isolated nucleic acid of claim 1, wherein said nucleic acid 
comprises a nucleic acid encoding a protein encoded by a gene selected from the group 
consisting of blml, blmll. and blmXI. 

6. An isolated nucleic acid comprising a nucleic acid encoding a module 
comprising two or more catalytic domains of a protein encoded by a nucleic acid of a 
bleomycin gene cluster wherein said catalytic domains are selected from the group cons.sting 
of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) 
domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, 
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an oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain. 

7. The isolated nucleic acid of claim 6, wherein said nucleic acid 
comprises a nucleic acid encoding one or more proteins comprising a module selected from 

5 the group consisting of NRPS-0, NRPS-1, NRPS-2, NRPS-3, NRPS-4, NRPS-5, NRPS-6, 
NRPS-7, NRPS-7, NRPS-9, and PKS. 

8. The isolated nucleic acid of claim 7, wherein said nucleic acid 
comprises an open reading frame from SEQ ID NO: 1, SEQ ID NO: 2, or SEQ ID NO: 3. 

9. An isolated nucleic acid comprising a nucleic acid encoding a protein 
10 encoded by a gene from a BLM gene cluster. 

10. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
nucleic acid encoding a protein encoded by a gene selected from the group consisting of 
bind, blmll, and blmXI. 

11. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
15 nucleic acid encoding a protein encoded by a gene selected from the group consisting of 

blmlll, blmJV, blmV, blmVI, blmVH, blmlX, and blmX. 

12. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
nucleic acid encoding a protein encoded by blmVIII. 

13. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
20 nucleic acid selected from the group consisting of blml, blmll, and blmXI. 

14. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
nucleic acid selected from the group consisting of blmlll, blmlV, blmV, blmVI, blmVll, 
blmlX, and blmX. 

15. The nucleic acid of claim 9, wherein said nucleic acid comprises 

25 blmVIII. 

16. An isolated nucleic acid comprising a nucleic acid that encodes a 
protein comprising at least one catalytic domain selected from the group consisting of a 
condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) 
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domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, 
an oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain, and that 
hybridizes to a nucleic acid selected from the group consisting of orfS, orf9, orfJO, orfll, 
orfl 1, orfll, off 14, orfl 5 1 or/15, orfl6, orfll, orfJ8, orfl9, orJ20, orfll, orf22, orfll, orf24, 
orf25, orf26, orf27, orf28, orf29, orflO, orfll, orfll, orfll, orfl4, orfl5, orfl6, orfll, orflS, 
orfl9, orf40. and orffl under stringent conditions. 

17. The nucleic acid of claim 16, wherein said isolated nucleic acid 
comprises a nucleic acid encoding a module. 

18. The nucleic acid of claim 1 6, wherein said isolated nucleic acid 
comprises a nucleic acid encoding a BLM gene. 

19. An isolated nucleic acid comprising a nucleic acid selected from the 
group consisting of consisting of orf8, orf9, orflO, orfll, orfl2, orfll, orfl4, orflS, orfl5, 
orfl 6, orfll, orfl 8, orfl 9, orf20, orf21, orf22, orfll, orf24, orf25, orf26, orfll, orflS, orf29, 

15 orflO, orfll, orfll, orfll, orfl4, orflS, orfl6, orfll, orflS, orfl9, orffO, and orf41, or an 
allelic variant thereof. 

20. The nucleic acid of claim 19, wherein said nucleic acid comprises a 
nucleic acid that is a single nucleotide polymorphism (SNP) of a nucleic acid selected from 
the group consisting of consisting of orf8, orf9, orflO, orfll, orfll, orfll, orfl4, orflS, 

20 orfl 5, orfl 6, orfl 7, orflS, orfl9, orflO, orfll, orfll, orfll, orf!4, orf!5, orf!6, orfll, or/18, 
orf29, orflO, orfll, orfll, orfll, orfl4, orfl5, orfl6, orfll, orfl8, orfl9, orf40, and orf41. 

21. An isolated gene cluster comprising open reading frames encoding 
polypeptides sufficient to direct the assembly of a bleomycin. 

22. An isolated multi-functional protein complex comprising both a 
25 polyketide synthase (PKS) and a peptide synthetase (NRPS). 

23 . An isolated nucleic acid encoding a multi-functional protein complex 
comprising both a polyketide synthase (PKS) and a peptide synthetase (NRPS). 
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24. An isolated polypeptide comprising a catalytic domain encoded by a 
nucleic acid of a bleomycin gene cluster wherein said nucleic acid comprises a nucleic acid 

selected from the group consisting of 

a nucleic acid encoding any one ofBlm open reading frames (ORFs) 8 

through 41; and 

a nucleic acid amplified by polymerase chain reaction (PCR) using 
any one of the primer pairs identified in Table II. 

25. The polypeptide ofclaim 25, wherein said polypeptide comprises an 
enzymatic domain selected from the group consisting of a condensation (C) domain, an 
adenylate (A) domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization 
domain (Cy), an acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), a 
Icetoacyl synthase (KS) domain , an acetyl transferase (AT) domain, a ketoreductase (KR) 
domain, and a methyltransferase (MT) domain. 

26. The polypeptide claim 25, wherein the nucleic acid of a bleomycin 
gene cluster comprises a nucleic acid encoding at least two open reading frames selected 
from the group consisting ofBlm open reading frames 8 through 41. 

27. The polypeptide claim 25, wherein said nucleic acid of a bleomycin 
gene cluster comprises a nucleic acid encoding at least three open reading frames selected 
from the group consisting ofBlm open reading frames 8 through 41. 

28. The polypeptide claim 25, wherein said polypeptide comprises a C 
domain lacking one or more His residues of the conserved HHxxxDG active site for 
transpeptidation. 

29. The polypeptide claim 25, wherein said polypeptide is a polypeptide 
encoded by a gene selected from the group consisting of blml blmll, and blmXl. 

30. An isolated polypeptide comprising a module comprising two or more 
catalytic domains of a protein encoded by a nucleic acid of a bleomycin gene cluster wherein 
said catalytic domains are selected from the group consisting of a condensation (C) domain, 
an adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a 
condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, an 
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oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domam. 

31. The polypeptide of claim 30, wherein said polypeptide comprises a 

•e*;„„ r>f nrp<s 0 NRPS-1 NRPS-2, NRPS-3, NRPS-4, 
module selected from the group consistmg of NRPS-0, NKxb 

5 NRPS-5, NRPS-6, NRPS-7, NRPS-7, NRPS-9, and PKS. 

32. An isolated polypeptide encoded by a gene from a BLM gene cluster. 

33. The polypeptide of claim 32, wherein polypeptide is encoded by a 
gene selected from the group consisting of blml. blmll. and blmXI. 

34 The polypeptide of claim 32, wherein said nucleic acid comprises a 
10 nucleic acid encoding a protein encoded by a gene selected from the group consisting of 
blmlll, blmlV, blmV. blmVI, blmVII, blmlX, andblmX. 

35. The polypeptide of claim 32, wherein polypeptide is encoded by 

blmVIII. 

36 An isolated polypeptide comprising a module wherein said module is 
specifically bound by an antibody that specifically binds to a BLM module selected from the 
grouP consisting of NRPS-0, NRPS-1, NRPS-2, NRPS-3, NRPS-4, NRPS-5, NRPS-6, 
NRPS-7, NRPS-7. NRPS-9, and PKS. 

37 The polypeptide of claim 36, wherein said polypeptide is specifically 
bour ,d by an antibody that specially binds to a polypepide encoded by a gene selected 

20 from the group consisting of of *M Mrt »*L 4 """' UM ^ 
blmlX, blmX, and blmVIII. 

38 An isolated polypeptide comprising a polypeptide encoded an open 
reading frame of a nucleic acid selected from the group consistmg of SEQ ID NO:l , SEQ ID 
NO:2, and SEQ ID NO:3, or an allelic variant thereof. 

39 The polypeptide of claim 38, wherein said nucleic acid comprises a 
single nucleotide polymorphism (SNP) of an open reading of a nucleic acid selected from the 
group consisting of SEQ ID NO: 1 , SEQ ID NO:2, and SEQ ID NO:3. 
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40. An expression vector comprising a nucleic acid of any one of claims 1 

through 23. 

41. A host cell transformed with an expression vector of claim 40. 

42. The host cell of claim 4 1 , wherein said cell is transformed with an 
exogenous nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct 
the assembly of a bleomycin or bleomycin analog. 

43. The cell of claim 41, wherein said cell is a bacterial cell. 

44 The cell of claim 43, wherein said cell is a Streptomyces cell. 

45 . The cell of claim 4 1 , wherein said cell is a eukaryotic cell. 

46. A method of chemically modifying a biological molecule, said method 
comprising contacting a biological molecule that is a substrate for a polypeptide encoded by 
one or more bleomycin biosynthesis gene cluster open reading frames with the polypeptide 
encoded by one or more bleomycin biosynthesis gene cluster open reading frames, whereby 
said polypeptide chemically modifies said biological molecule. 

47. The method of claim 46, wherein said method comprising contacting 
said biological molecule with at least two different polypeptides encoded by blm gene cluster 
open reading frames. 

48. The method of claim 46, wherein said method comprising contacting 
said biological molecule with at least three different polypeptides encoded by blm gene 
cluster open reading frames. 

49. The method of claim 46, wherein said contacting is in a host cell. 

50. The method of claim 49, wherein said host cell is a bacterium. 

51. The method of claim 46, wherein said contacting ex vivo. 

52. The method of claim 46, wherein said biological molecule is an 
endogenous metabolite produced by said host cell. 
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53 . The method of claim 46, wherein said biological molecule is an 
exogenous supplied metabolite. 

54. The method of claim 46, wherein said host cell is a eukaryotic cell. 

55. The method of claim 54, wherein said eukaryotic cell is selected from 
5 the group consisting of a mammalian cell, a yeast cell, a plant cell, a fungal cell, and an 

insect cell. 

56. The method of claim 46, wherein said biological molecule is an amino 
acid and said polypeptide is a peptide synthetase. 

57. The method of claim 46, wherein said polypeptide is a methyl 

10 transferase. 

58. A method of coupling a first amino acid to a second amino acid, said 
method comprising contacting the first and second amino acid with a recombinant^ 
expressed bleomycin nonribosomal peptide synthetase (NRPS). 

59. The method of claim 64, wherein said NRPS is selected from the 
15 group consisting of NRPS-5, NRPS-4, NRPS-3, NRPS-9, NRPS-8, and NRPS-7. 

60. The method of claim 64, wherein said NRPS is selected from the 
group consisting of NRPS-6, NRPS-2, NRPS-1, and NRPS-0. 

61. The method of claim 64, wherein said contacting is in a host cell. 

62. A method of coupling a first fatty acid to a second fatty acid, said 
20 method comprising contacting the first and second fatty acids with a recombinant^ 

expressed bleomycin polyketide synthase (PKS). 

63. The method of claim 62, said contacting is in a host cell. 

64. A method of producing a bleomycin or bleomycin analog, said method 

comprising: 

25 providing a cell transformed with an exogenous nucleic acid 

comprising a bleomycin gene cluster encoding polypeptides sufficient to direct the assembly 
of said bleomycin or bleomycin analog; 
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culturing the cell under conditions permitting the biosynthesis of 

bleomycin or bleomycin analog; and 

isolating said bleomycin or bleomycin analog from said cell. 

65 An isolated nucleic acid comprising a nucleic acid encoding a 
phosphopantetheinyl transferase said nucleic acid encoding a phosphopantetheinyl 
transferase being selected from the group consisting of: 

a nucleic acid encoding the protein encoded by the nucleic acid of 

SEQ ID NO:3; 

a nucleic acid amplified by polymerase chain reaction (PCR) using 
primers that specifically amplify ORF 41 (primers: SEQ ID NO:71 and SEQ ID NO:72) and 
Streptomyces nucleic acid as a template; 

a nucleic acid encoding a polypeptide having phosphopantetheinyl 
transferase activity where said nucleic acid specifically hybridizes to the nucleic acid of SEQ 
ID NO: 3 under stringent conditions. 

66. The nucleic acid of claim 65, said nucleic acid comprising a nucleic 
acid of SEQ IDNO.3. 

67. A polypeptide comprising a phosphopantetheinyl transferase encoded 
by SEQ ID NO:3. 

68. A vector comprising the nucleic acid of claim 66. 

69. a cell transfected with the vector of claim 68. 

70. A method of converting an apo-carrier protein to a holo-carrier protein 
comprising reacting said apo-carrier protein with a recombinant phosphopantetheinyl 
transferase encoded by SEQ ID NO:3 and coenzyme A thereby producing a holo-camer 
protein. 

71. A cell comprising a modified bleomycin gene cluster nucleic acid, 
said cell producing elevated amounts of bleomycin as compared to the wild type cell. 

72. The cell of claim 71, wherein said cell overexpresses a resistance gene 
from the bleomycin bene cluster. 
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73. The cell of claim 72, wherein said resistance gene is a gene listed in 

Table III. 
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18601 ACCCATCTCATAGGTGTACGCGCTGGAGCATTCGGGGCACGACGGAAGGTTCTCGGTCAC 

18661 GAGAGCACTCTAAGCCCGAACCCGCAAGGATGACGAATTGCAAAATTGTGCAAGTCGCTA 

18 721 CATGATGGTCCGGCTGTGCCCGCAGGTAGCCGCGGGCACAGCACCAGACGCTGCCTCCGC 

18781 GCACCGCGCGGGAGGCCCGGTGAGGCX^^ 

1884X CAGGCGAAGGCCGAGTTCTTCCGGATGCTGGGGCACCCGGTCCGCATCCGCG^ 

QAKAEFFRMLGHPVKJ- 

18901 CTGCTGCAGGACGGGCCGATGCCGGTGCGTGATCTGCTGG^ 

1B961 TOMCOCMio^^ 
190 21 ACGGGTTCCACGGTCGTCTAC^ 
19081 CCGCGCCGCATCCTG^^ 

19X41 GAAGCCGAGGTCAGTGCCCGGTGAGCTCCCTCGCCGTCCGGGTGGGAGCCCGGGTGCGTT 
E A E V S A R M * 8 S L A V R V G A R V R S 
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GLGAEAGLAlAvv« 
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FGGSNLQVSGPTOrA"! 

19441 TGCCCATCGTCG^ 

19501 TGATGCTGATCGCGCTCGCCCTCGCCCGCGCCGGCCGCTACATGCAGTACGTGCCGG^ 

MLIALALAKAVjki 

19561 CGGTGGTGGAGGGCTTCACCCrCGGCATCGCCTGCGTGATCGGCTTGCAGCAGGTGCCGA 

VVEGFTLGlACVX^^vv 

19621 ACGCCCTGGGAGTCGCCAAGCCGGAGGGCGACAAGGTCCT 

ALGVAKPEGDKV u 

196 81 TCGAGACCTTCG^^ 

19 741 CGGTCATGCTGACCGGCGCGCGGTGGCGGCCGGTCGTTCTC 

19801 CCGGTGCCACCGTCGTGGCCCAGCTGTGCCACCTGGACGCGGCCCGCCCGATCGGGGACC 
GATVVACLCHLDAAKfi 

19861 TGCCCGCGGGGCTGCCCGCCCCGTCGCTG^^ 

PAGLPAPSLAFLDL-omw 



WO 00/40704 PCT/USOO/00445 

20040 



CCGTCGCGGA^ 
CC^CAC^ 

ccLLlccc^ 
ccgJ^ 

^CG^^ 
CCGTGG^^^ 

JcAcLlLrrcTTCccccccc^ 
dgplf faaahr 

cggacgtgcgcgtgg^atcc^^^ 

CCCTCMCCTO^^ 

ccIg^ 
ggLgccacc^^^^ 

20321 ACCTGCACGGCGCCGG^^^^ 

PCAPSARR* 

,1001 GAGGCGGCGGTGGACGGCCTGCCGCCGCCGGCCTCGGGCTGATCGGCGTGATCACCGCCC 

,1061 ATGCGCGGGTGGGCGCCCGCGGCATCGTGGGCGGGACCGTGTTCCCGGCCACCGCGGCGG 

21121 CCGGCCTCGCGCTGGGCGTGGCCTGCCGCGGTGCCTGGTAGCGGCGGGGTCCGGCGGCCG 

21181 GGCCTGTGCCTCTTCCCGCCCGTCCGGCGGGTGGCGCCGCGCCGGCGGTGACAGGGAAAT 

21241 ATGACCGGAACTGGGATGCTCGCGTCCACTCGGGTGTGTTTARGTGCCACGGGGGCTTCC 

21301 GA03G(XCGTCGCGCGCCGGCGGTTCGCCCGATGATGGTCGTGCGGCGCTGTGAGCCGGG 

2 i3si gagcctatggcac^ggacctgaacgactggatcg ^ r 2 f ° 8) 

21421 AAGCCT^^^ 21480 
2l 48 l oJgAtLaCC^ 21540 
21541 21600 



19981 
20041 
20101 
20161 
20221 
20281 
20341 
20401 
20461 
20521 
20581 
20641 
20701 
20761 



20100 

20160 

20220 

20280 

20340 

204O0 

20460 

20520 

20580 

20640 

20700 

20760 

20820 

20880 

20940 



21180 
21240 
21300 
21360 



WO 00/40704 PCTAJS00/00445 

21601 CGTCACGAATCGTTCGGTCACCGGTGCCTGGTGATCGGCATCTTCATGACC^CTTCGAC 21660 
RDESFGHRCLVIGIFMTFFD 

21661 GTGCACATCAACCGGATGCCTTACGGCGGCCGTCTCTCCTTCGCGCTCAAGGAGCCCATC 21720 
VHINRMPYGGRLSFAL.KEPI 

21721 GGGACGTTCAACCTCCCCATGCTGGC<^TGGAGCAGGACCTGCTCGAACGGCTCCGGGTC 21780 
GTFNLPMLAMEQDbLERLRV 

21781 AATCCGGCTCACGCGAGGTATCTGCACCTGAACGAGCGGATGGTCAACCGGGTCGACGCG 21840 
NPAHARYLHLNERMVNRVDA 

21841 CCGCGGCTCCGGGGCCCGTACTGGATGCTCCAGATCGCCGACTACGACGrCGACTCCATC 21900 
PRLRGPYWMLQIADYDVDS I 

21901 ACCCCGTTCTGCAGACGGCAGGGAATGTTCCGCTCCCAGGGGCGCCGCTTCTCCCAGATC 21960 
TPFCRRQGMFRSQGRRFSQI 

21961 CGCTACGGATCGCAGGTCGACCTGGTGATCCCGATGGCGGCCGACCGCGAGTACGTCCCC 22020 
RYGSQVDLVIPMAADREYVP 

22021 GTGGAGGCCGTCGGCCGGCACGTGAAGGCGGGGCTCGACCCGCTCGTCAAGATCCGGTGG 22080 
VEAVGRHVKAGLDPLVKI RW 

22081 CGTTGAAGAGCGCGTACGAAGCGATGGCGAACTGGAGGGACACAGCGTGGGTTTCCGTCG 2214 0 

^ # M G F R R (orr27) 

22141 AGCGCAGAGGGCCGGTGGGCCGGGAGCGGGCCGGCGGGAGAGCGCCCGGTTCAGGCCGGA 22200 
AQRAGGPGAGRRESARFRPD 

22201 CGGGCCGTCGGCGCCGCGGGACCGTCCGTTACCCCTGTCCGCCGGGCAGTTGTTCGAGTG 22260 
GPSAPRDRPLPLSAGQ LFEW 

22261 GGTGTTTGACAAGCTCGTCGACGGAGATCTGAGCCACCAGCCGACGATTGTGCGGCTCCG 22320 
VFDKLVDGDLSHQPTIVRLR 

22321 CGG CCCG CTG AAC AC CGCCG CCCTG CGGATGGCCT ACG CC CGG CTGGTGCGGCGC C ACG A 22380 
GPLNTAALRMAYARLVRRHE 

22381 GTG CCTG CG CACCCGCTTCCC CGTG ATCG ACGGGG AGCC CGTGCAGGTGATCGAG GGC AT 22440 
CLRTRFPVIDGEPVQVI EG I 

22441 CGGGAAAGCAGCGGGGGGCCCGCTGCCGCTCATCGATCTGCGCCACCTCCCGGAGGCGCT 22500 
GKAAGGPLPLIDLRHLPEAL 

22 501 TCGCGCGCGCGAGATCGCGAGGATCCGCGAGGAGACGCTGTCCACGCCGGTCCCCTTCGA 22560 
RARE IARIREETLSTPVPFD 

22 561 CAAGCGGCCGCCCGTCCGCGTGGCGCTGATCCGGGCGGCGCCCGAGGAGCACCTCTTCCT 22620 
KRP PVRVALIRAAPEEHLFL 

22621 CGTCGGCATCCCGCACATCACCGCGGACCTGTGGTCCGCGACCCTGCTCAACGACGAGCT 2 2680 
VGI PHITADLWSATLLNDEL 

22681 C ATGG CG CACT ACAGGGCGGGGGCCG AGGGG ACT CCCTCCCGGG CCCC CAC CC C CGTCG C 22740 
MAHYRAGAEGTPSRAPTPVA 

22741 GCAGTACGCCGACTTCGCGCAGTGGCAGCGCGCGTGGTGGAACCGGGACCGCACCGAGCG 22800 
jv T>T?anwoRAWWNRDRTER 



22860 



QYADFAQWQRAWWNRDRTER 

22801 GG AGG CC GG ACGGTGG CGGG CG CGG CTGGACGGGCTGT CCG CCGTGG AACTGCCCCTGG A 
EAGRWRARLDGLSAVELPLD 

22861 CCGGCCCCGCCCCGCGGGCCGCCGGCX^GACTGCTTCCrGATCC^GGAC^CCTTCGACGC 22920 
RPRPAGRRRDCFLIGDTFDA 

22921 CGAACTGAGCGACCGGCTGCGCGCCTTGGCACGCACCGCCGACGTCACGCTGTACGTGGT 22980 
ELSDRLRALARTADVTLYVV 

22981 G CTG CTGG CGG CGTTCCACTGGCTGGTGGGG CGG ATGTCGGG CG C CGGCCGGCTGGTG AC 23040 
LLAAFHWLVGRMSGAGRLVT 

23041 C ACCTCG CTCGTGGCCG CCCGG CACGGCAGCG CGGTACAGGGG ATG ACCGG C CCGTTCT C 23100 



WO 00/40704 PCT/US00/00445 

TSLVAARHGSAVQGMTGPFS 



23101 GGACTACCTGGCCCTGGTCGGGGACCTCTCGGGCGATCTC 

DYLALVGDLSGUfc'w.r 

23161 ccccgta^a^ 

2322 1 CCTCGA^GTCATGGACCC^ 

LEVMDPGRELHFMf^^^ 

23281 CA^CCTCCAC^ 

23341 GGTCAACCCGGAGGGGGACGACGGGGAGAGCGGCGACGGGGAGTACGTGCCCTC 

23401 CGACCTGACC^CGA^^ 

23461 CGACCGGCGGC TC ^ 

235 2i gLgtgcg^^ 

23S 81 CCTGCCGCGACCGCCGTCCG^ 
23641 CGAAGGCGAGTTG^^ 
23701 GCGCCTGGCCACCGGGcW 

23761 GCCGCGTCCGAACGCGGCCCACGACATCCGCGGCCGCCTG^ 
23821 cLgXGC^GACCGTGTTC^ 

23881 CCGGGCGGCGGGCGGCGAACGGGCGGAGCCGCTGCCGCCGCCCGAGGACTGCGTCCCGCT 

RAAGGERAEPLr 
23941 TCCCGAGGAGGGCCGGCCCCCCTCGGACCCGTCCGAGCGGCGGCTGGCCGCGCTCTGGGC 

pEEGRPPSDPSERKl.rt^ 

24001 CGAGATC^^ 

24061 CGAT AAGG ACGC CCTCCG CTT CCTGG CCCG CGTGG CGG AGG ACTTCGGCGTCACCGTGC C 

24121 CTTC^ 

24181 acggagggtgtaacgcgcaAtgagtgagtggtagggtcggaatcgaaccgcactga 

R R V * 

24241 caatcttttcggtcagctgttccggatattccggggcgcgtcggcgctccctcgaccaag 
24301 ggcgtacgcggataagcgtgcgccgccccacggctgcgtctcgacgccttcatcggcgcg 
24361 tcggacacttcgcggtgccagtcgg^^ 

24421 CGAGGTGCGCTC^ 

24431 ACGGCG^^^ 

24541 OGCGGGOGG^ 

R G R D ^ 



23160 

23220 

23280 

23340 

23400 

23460 

23520 

23580 

23640 

23700 

23760 

23820 

23880 

23940 

24000 

24060 

24120 

24180 

24240 

24300 

24360 

24420 
<or£26) 

24480 
24540 
24600 
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24601 CTGGAC*CCGCA^^ 24 " 0 

24661 Lacgaccttcacox^^ 24720 



24 



,21 GACGAGCAGCCGCTGTCCC^T^^ 24780 
DEQPLSVVDLPPSCAU^^ 

247B1 GAACT^ACCAGCTCCC^cW^ "840 
ELDELRLRERAALDPKLror 

24841 rxccoGGCcjcc^ 24900 

24901 GCCCTGGTCGCGG^CCGGCTCTCCCTCC© 24960 

249 61 AGCGGGGAGACCGTGTCCCXC^ 25020 

2S021 TGGCACCACGACCTGCTCACCGCCGAGGACGCCGCCCCC^ "080 

25081 CAC^CGCCACCG^ 25140 

25141 GGTCCGTGGCGGGCGCGGGAGTCGGAACTGCCCGCCGAACTGGTGGro 25200 

25301 GTCGCCGGGMGCTGTCCACCGATCCCGCCACCGTGCT 25260 

25261 GTCTGGCGGCTCGCCGGCGAGCGGAACCTGCCCCT 25320 



V W R 



25321 CACCCCGAACTCCGCACCGCGATCGGCGCCTTCGAGCGT^GCTCCCGCTCGTCCACGAG 25380 
HPELRTAlGAFERtLi'ij 

25381 ATCCGTCACGAGACGGCG^CGCGGAATACGCGGGCGCTCTGG^ 25440 

25441 GGCGAGGAACTCCTCGACCATTGCGACCCGGAACTGC^^ 25500 
GEELLDHCDPELLGSLDU1* 

25501 GAAG^CCCTGCTTCACCTTC^CCCACCACCAGGCCGAAACACCGGTCC^ 25560 
EGPCFTFTHHQAETPVKKM" 

25561 ATCACCmACCACCGTCCATCAGGATTCGGGT^ " 62 ° 

25621 CGACGCGACGGCGCCCGGCTCCGCATGGAACTGGGATACGACGAGG^CCGTATCGACGAG 25680 
RRDGARLRMELGYDEGK1U* 

25661 ACGTTTCCCGAGAACG^^ 25740 

25741 CCCGAGGGCCCXSGTCG^CGACATCCGCATG 25800 

25801 GAAGCGGGGCTGGGCCCCCGCGTGGAACITCCCGGCAAGG^ 25860 
EAGLGPRVELPGKAVHELr A 

25861 GAGCAGGCCGCGCGCACCCCCGGGGCGGTCGCGGTCAGCGCGGGCGAGGACGCCCTCACG 25920 
EQAARTPGAVAVSAGEDAIj 

25921 TACGCCGAACTCGACGAGCGGTCCAACCGCCTGGCACACCACCTGACCGGGCT 25980 
YAELDERSNRLAHHLTt.lii»v 

25981 AC^CCCGGCCGGCACGTCGTGGTCTCGGTCGGCCGCTCCGC(X^GCTGCTCGTCGGGCTG 26040 
T PGRHVVVSVGRSAELLVGL 

26041 CTCGGCGTGCTCAAGGCGGGTGGCGCCTTCGTCCCJGTCGACGTGGGCTTCCCCCGCAAA 26100 



PCT/US00/00445 

WO 00/40704 

LGVLKAGGAFVfvu 



Ml „ *j^444rs4r?r^^ 2i22< 



26280 



D A P A 



ill^^^^ 26520 

J66<1 ^li^ 2s,o ° 

!6701 ^^T^rfT^^rT^s^ 26750 

!67 „ 26820 

260J1 Llc^^ 26,80 

«... ^LL^^ js9, ° 

J70M Lll!^ 

27M1 111=^^ 27180 

27181 1J444^^ 27240 

„ M1 ^444rfTT?Trr^ 2,360 

,„„ LLc^^ 27,20 

DNYFVLGGDSIRtjvii 



WO 00/40704 PCT/USOO/00445 
275*1 aGGcooGoaGOCTawMioaxna 27600 

QARCVEVTVADLHRHPTVKH 

27601 TGCGCCGCGCACCTGGACGCCCGCGAGGACCTGCCG^CGCCCGTCACC^CCCrrC 
CAAHLDAREDLPRTPVitr 

27661 GCGCTGATCTCCGCCGAGGACCGGGCGCTGGTC 

27721 CTGAACCTcirCCAK^ 

LNIiLQEGMIFHRDFAAKSAV 

27781 TACCACGCCATCGCGTCCGTGCGGCTGCGCGCCCCGTTCGACCTCGCCGTGCTGCGGATG 
YHAIASVRLRAPFDLAVL.KW 

27841 ctcgtgcgc^ 

27901 TTCAGCCGCCCGCTGCAACTGGTGCACCGCGA 

FSRPLQLVHREFADPLHYEU 

27961 CTGCGCGGCAGGAGCGCCGAGGAGCAGGACGCCCGCGTCGAGGAGTGGATCGAGCGGGAG 
LRGRSAEEQDARVEEWIER E^ 

28021 AAGGAACGCGGCTTCGAGCTGCACGAGT^^ 

KERGFELHEFPLIRFMAQRL 

28081 GAGGACGACGTCTTCCAGTTCACCTACGGCTTCCACCACGAGATCGTGGACGGCTG 

EDDVFQFTYGFHHElVDGWb 



27660 



27720 



27780 



27840 



279O0 



27960 



28020 



28080 



28140 



28141 GAAGCCCTGATGATCACCGAGCTGTTCAGCCACTACT 

EALMITELFSHYFSVIYDEP^ 

28201 atcgcgatcaagccacccaccgccggcatgcgcgacgccgtcgccctggAgctgg^ 
iaikpptagmrdavalelea 

28261 CTCGCGGACCGCCGCAACTACGAGTTCTGGGACTCCTACCTCGCCGACGCCACCCTGATG 
LADRRNYEFWDSYLADATLM^ 

28321 CGG CTG CC C AGGCC CGG C AC CGG ACC C CGGG CCG ACAAGGG CG ACCGGG ACAT CACCCG C 
RLPRPGTGPRADKGDRDI1K 

28381 ATCGCCGTCCCCGTCCCCAC^ 

IAVPVPTELSDGLKRVAA i n 

28441 GCCGTCCCGCTGAAGACCGTGCTCCTGGCCGCG^ 

AVPLKTVLLAAHMVVMSLYC 

28501 GGCCACGAGGACACCCTCACCTACACCGTCACCAACGGCCGCCCCGAGACCGCCGACGGC 
GHEDTLTYTVTNGRPETADG 

28561 AG C ACCG CG AT CGGG CTGTTCGT C AAC AG CCTCGCG CTC CG CGTCCGG ATG ACCGG CGG C 
STAI G h FVNSLALRVRMTGG 

28621 ACCTGGKO^C^ 

TWADLITATLESERASMPYK 

28681 CGGCTGCCGATGGCCGAACTCAAGCGCCACCAGGGCAACGAACCCCTGGCCGAGA^ 

RLPMAELKRHQGNEPLAETL 

28741 TTCTTCrrCACCAACTAra 

FFFTNYHVFHVLDRWIDRGV 

28801 GGCCACGTCGCCAACGAGCTCTACGGCGAGTCCACCTCCC^ 

GHVANELYGESTFPFCGIFR 

28861 CTGAACCGGGAGACCGGCGAGCTGGAGGTCCGCATCGAGTACGACAGCCTGCAGCTCTCC 
LNRETGELEVRIEYDSLQFS 

2 8921 GACGCCCTCATGGAGAGCGTCOSttACAG^ 

D A LMESVRDSYARVLAALVA 

289B1 GACCCCGAC<^GCGCTACGACCGGCACGAGTTCCGCTCCGACCGCX3ACCGGGCCGCACTG 
DPDGRYDRHEF R^S D R D R A A L 



28200 



28260 



28320 



28380 



28440 



28500 



28560 



28620 



28680 



28740 



28800 



28860 



28920 



28980 



29040 
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29041 GCCOTCCTcicCCGCG^^ 29100 
AVLTRGPEAPAADRCLHOUV 



29160 



29101 GCGC*Ca^C<*CGGACCGCCCCGAC^ 

ADRAADRPDAPAVQLDTDVi. 

29161 AGCTACGGCGAGCTCX3ACCGCCGCGCCAACCGGCTGGCCCACCACCTGCGTTCGCTCGGC 29220 

SYGELDRRAN RLAHHL,K 

29221 ATCGGCCCGG^GAGC^ 29280 
29281 CTCCTCGCGGTCCTCAAGGCGGGCGCCGCCT 29340 



29400 



29341 GAGTOCCTCGCCGCCGTCATCGCCGGGAGC^ 

ERLAAVIAGSGAAAVLHRrti 

29401 CTCGAAGGGCGGCTCCCCGCGGGCGTCCGCGCGCTCCC^ ™™ 
29461 ACCGCCACGCACGACCCCGGGCCCACCGCCACGCCC^ 29520 

29521 ACCTCCGGATCCACCGGAGAGCCCAAGGGCATCGTCGTCGAACACTOCAACGTCGTGGCC 29580 
TSGSTGEPKGIVVEHRNVVM 

29581 TCCCTCGCCGCCCGCGGCGCCCACTACGCGGCCGGACCCGGCC^^CCTGCTGCTGTC^ 29640 
SLAARGAHYAAGPGRFLLLS 

29641 TCCTTCGCC^CGACAGCTCGGTCGCCGGCATCTTCTGGACGCTGACCCAGGGCGGCACC 29700 
SFAFDSSVAGIFWTLTQfct.1 

29701 CTCGTCCTGCCCGGCGAGGGACAGCAACTCGACCCCGCCGCGCTGGTGGAGACCATCGCC 29760 
LVLPGEGQQLDPAALVETIA 

29761 CGGCAACGGCCCACCCACACCCTCGCCATC^ 29820 
RQRPTHTLAI PSLLAPVLUU 

29821 GCCGCCCCCGGCGACCTCGCCTCCCTGCGCACGGTGATCGCCGCGGGCGAGTCCTGTCCG 
AAPGDLASLRTVIAAGESt-f 

29881 GCCGAACTGGCCGCCGCCTGCCGGGACCTGCTGCCCGGGAGCACC^CCACAACGAGTAC 
AELAAACRDLLPGSTFHNEi 

2 994 1 GGCCCCACCGAGACCACCGTGTGGAGCACCGTCTGGTCCCAGGAGAACGAGCACGACGGA 
GPTETTVWSTVWSQENEHDQ. 

30001 -CCCCACCTCCCCATCGGCCGGCCGGTCGCGGGCACCTGCKTGCACCCCCGCGACCA^ 
PHLPIGRPVAGTWVHPRDHR 

30O61 GGACGCACCGTCCCCCTCGGCGTCGCCGGCGAACTCTCCATCGGCG^CGCCGGCGTGGCC 30120 

GRTVPLGVAGELSIGGAUVA 

3012! CGCGGCTACCTCGGGCGCC^^ 30180 
RGYLGRPRDTAAAFRPDPEA 

30181 ACC^CTCCCGGCGGCCGCGCC^ 30240 
TAPGGRAYATGDLGRYLPDO 

30241 AACCTGGAGTTCCTCGGCCGCGCCGACCACCAGGTCAAGMCC^ 

NLEFLGRADHQVKIRGFRVE 

30301 CTCGGCGAGATCGAGGCCGTCCTCGACACCCACCCGGAGCTC^GCGGACCATCGTCATG 
LGEIEAVLDTHPELQRT1VM 

30361 GCACGCGGCGACCACCCCGGCGACCAGGTGCTCGTCGC^ 

ARGDHPGDQVLVAYVLPAP^ 

30421 CGGCGGCCCGAACCCGCCGACATCCAGGGGTACGTCCGCG^ 30480 
RRPEPADIQGYVRDRLPRYM 

30481 GTGCCCACCGCGGTGATCGTCCTCGACGCGGTACgCTGACCGCCGCCGGCAAGGTCGAC 30540 

o 



29880 
29940 
30000 
30060 



30300 
30360 
30420 



WO 00/40704 



D A V 



L T A A G 



PCT/USOO/00445 

K V D 



3054 



30601 



X ^GCCTCGCTCCC^^ 

icLcACCGACACCGAG^^ 

PGTDTERALAAIWAUvl, 



30661 CGG AT CGGGG CCGGTG A* 



CCGCTTCTTCGACGTCGGCGGCGAATCCCTGrcCGCGATGCAG 



G G E 



30721 



GCCACCGCCGCGGCCAACAAGA' 



TCTTCCGC^CCCGCGTCTCCGTCCGCCGCCTCTTCGAG 



N K M 



ICCTCCCTGCGGGAGTTCGCCCACGAGATCGAi 



CAAGGCCCGCCTCGCGGGCGGCGGG 



3 0781 GCGO 



30841 ACCGGCCTCACCGGCCCCGCGGCCGCCCCI ^ >j- G G A A E 



G G 



30901 



30961 



31021 



CCGCCGACACCACCCACCCGCTCTCGCCi 

HPLSPAQR 



EIDKARLA 

IGGCCACCGGAGGTGCCGCCGAATGACCCCGG 
M T P A 

GGCCCAGCGCAGCATGTGGTTCCTGCACCGGC 



aD TTHPL S *mw«SMWF^HRL 
TCGCGCCCGAGGTGCCCGCCTA^ 



P E 



PAY 



0000000*^^ 

cggtgIcccgt^^ 

TGCGGACCGT^ACCTCACCCACCTGACCCCCG^ 

R TVDLTHLTPAAAEAbiHi^ 

3X201 CGCTACGGTGCGCCGCC^ 



31081 



31141 



31261 



31321 



31381 



31441 



31501 



31561 



31621 



31681 



31741 



LRCAAARPF 
CCCTOCIG^^ 

TCGACGGCGGC^ 

TCGCCGGGCGCCCGGACCCCCTCGGCACAC^ 



L G T 



A A E Q D E A 



CGCCACCCCGC^ 



CCGG CCCCG ACCGCGG ACTT CTG C CGCG AG CACG 



p T A D 



H 



CCACCGTCCACTACGGCACCGACGAT' 
TVHYGTDDf 

CCGTCACCGGCTACGTGCTGCTGCTCGCGGCCCTCGCCTGCCTGGTCGCCOTGTACACCG 
VTGYVLLLAALACLVAKI 

GCCGGACGGACGTO^WtCAC^ 



T D 



IGSPVGLREDPEGLA 



C^CCGTCGGCCCGATGCTCAACCTGCTGCCGCTGCGCCT^ 



T V G 



M L N L 



P L R L 



31801 



G CTTCGG CG AGGT CCTGG CCCGCACCCGGG AG ACGCTGCTCGGCG CGCTGG AG CAC A 
FGEVL ARTRETLLGALEHKi 

U861 ^^^^ p^^^ ^ g^^^^^y^" ^Q^^^y^c^ ^ A^" p^D^ ^ 



31921 



CCCTCTTCCAGATCCTCTTCGCCCACGAAO 



F Q 



A H E 



'GCCCCCCGGCCCCACCCGCGTTACCGGGCG 
PPAPPALPGV 



30600 

30660 

30720 

30780 

30840 

30900 

(or£25) 

30960 

31020 

31080 

31140 

31200 

31260 

31320 

31380 

31440 

31500 

31560 

31620 

31680 

31740 

31800 

31860 

31920 

31980 



WO 00/40704 



PCT/USOO/00445 



3X981 TCCGT^CCCGCGTCGTACCCGTCCCCGC^ 

RARVVPVPAPAA^** 

320 41 ccaccg**^*^^ 

TETPDGLRki. vci 

3 ai.i cggccg^ac^ 

32 isi cgccggacacaccgctgag^ 

32221 CCG^CACCACG^^ 



322 



;81 TCGAGGAGTCCGCCGCCCGCCGGCCCGACGCCCTGGCGGTTO 

eesaarrp° a1jAV 

3234 1 TCAGCTACCGG^^ 

32401 GCATCGGCACCGAGGACGTGGTCG^CCT 

32461 CGCTCCrCGCCGTCCTCAAGGCCGGCGCCGCCTA^^ 
LLAVLKAGAAi u 

LGT pPAPAGTfvnn 
32641 CGOCGACCCGGCCC^ 

3270 1 CCTACCTCCTCTACACCTCCGGGTCGACGGGCCGGC^ 
32761 ACAGCGCCGC^ 
3282 1 CcLc TC GCCACCAC TT 

3288 1 TGGCCCACGGCGGCACCGTCG^^ 

AHGGTVVLADSALHVfAi-. 

329 41 GGGCGCCCGCGG^^ 
33 001 CCGACG^CCTGCCCGACGGTCTGACGGCCG^ 

33061 AGCTGGTCGCCCGGCTGCACGCCCGCCTGCCGAAGGCCGCCGTCCGCAACCT 
LVARLHARLPKAAVKH^ 

33121 CCTCGGAGGCCACCACCTACGCCACCGCGGCCCT^ 

33XB1 CGJCCMCokoW^ 
332.1 CCCTCCCCG^CG(X3GTCGTCGGTGAACT 

333 01 ACCTCC^CCGGCCGGGACCGAC^^ 

LGRPGPTADAFRfLif^ 

3336 i -cccgg^ctac^^ 

3342 1 TCCTCGGCCGCAAGGACGAGCAGATCAAACTCCGCGGGGTGCGCATCGAACCGGGCGAGG 



32040 

32100 

32160 

32220 

32280 

32340 

32400 

32460 

32520 

32580 

32640 

32700 

32760 

32820 

32880 

32940 

33000 

33060 

33120 

33180 

33240 

33300 

33360 

33420 

33480 
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LGRKDEQIKLRGVRIEPGEV 

33481 TGGAAGCCGCTCTCCGCCAGTGCGCGCCGGTCGCCGCGGCCGCCGT^TGCTCGCCGGGA 
EAALRQCAPVAAAA.VVLAUl 

3354 1 CCACCGCGGAGAACCACCGCCTCGTCG^CTTCGTCACCC^ 

33601 ACCCCGAGCGCACCCTCGCCGCGCTGCGTTCGCGC^^ 

pERTLAALRSRLPAALVPAA 

33661 CG CTGGTGGTGTGCGACGCCCTGCCGCTGA 

33721 TCGCCCGGCGGGCGCG^ 

33781 GCGTC<»^ 

VEKAVAAlWREVLGTERVUi 

33841 TCCACCAGGGGTTCTTCGACGCGGGCGGCACCT 

HQGFFDAGGTSLSLLRLHHR 

33901 GGCTGGTCGCGTCCGTCCAkcCGGCCTCCGGCTCGCC^ 

LVASVHPGLRLADVrKijf 

33961 TCGCCGCGCTCGCCGCGTTCGTGG^ 34020 



33540 
33600 
33660 
33720 
33780 
33840 
33900 
33960 



34021 ACG CGG C C CT C CGGGCCGG C CGGCGCCG CG CCGCGG TGG CCGCG CG CCG CAGG AAAGGCG 
AALRAGRRRAAVAARRRKG 



34080 



A A L R 



G 



34081 GCGGACGATGAGCCATGCCGACGCGGGCGACGGGCTCGACG^ 34140^ 
G R * 

34 141 GGCCGACGGGATCGCCGTGATCTCGCTGGGCGGACGCTTCCCCGGAGCGGACCGGGTGGA 
ADGIAVISLGGRFPGADRVU 

34201 CCGCCTCTGGACGAACCTGCTCGACCGCGAGGACGCCATCAGCCAC^CACCGCCGACGA 
RLWTNLLDREDAISHFTAUE. 

34261 ACGCCTCGCCCGGGGCCGCGACCCCGAACTGGTGCGCCACCCGCGGTTCGTCGGCGCGGA 
RLARGRDPELVRHPRFVGAt 

34321 AGGCGTCCTCGGCGACGTCTCCCTCTTCGACGCCGAGTTCTTCGGCTGCTCGCCGCGCGA 
GVLGDVSLFDAEFFGCSPRb 

34381 GGCCGAAGTCATGGACCCGCAGCACCGGCTCTGCCTGGAGGAGGCGTGGCACGTCTTCGA 

AEVMDPQHRLCLEEAWrtvr 

34441 CACCGCCGGCTACGACCCGGCGGCGACGGGCACCGCGGTCGGGGTGTTCCTCTCCGCGAG 
TAGYDPAATGTAVGVFLSAb 

34501 CCTCAGCTCGTACCTGATCCGCAACGTCCTGCCCGGCGGCGCGGCACAGCGCCTGCTCG^ 
LSSYLIRNVLPGGAAQRLL^ 

34561 cggcttcccgctgctgatccacaacgacaAggactttctggccaccaccgtgtcc^ 

GFPLLlHNDKDFLATTVSHi^ 

34621 actgggcct<^ccgggccgagtta^^^ 

lgltgpsyavgsacssslva 

34681 GGTGCACCTGGCCTGCCAGAGCCTGCTCACCGAGGAATGC^ 

VHLACQSLLTEECDMALAGG 

34741 GGTCTCGCTCCAAGTGCCGCAGGGCCAGGGGTACGTGCACGCCGACGACOTCATCTACTC 
VSLQVPQGQGYVHADDGIYS 

34801 ACCCGACXjGGCGCTGCGCCCCCTTCGACGCCGGCGCGGCGGGCACGGTGG^ 

PDGRCAPFDAGAAGTVGG^o 

34861 CGTOTGCCTCGTCCTGC^ 

VGLVLLKRLADA.VRDGDRvn 



34200 

34260 

34320 

34380 

34440 

34500 

34560 

34620 

34680 

34740 

34800 

34860 

34920 



AftilMA , PCT/USOO/00445 
WO 00/40704 

34921 CG^GGTCATC^^ ^ 

CACCGGCCAGAGCGCCGTCC^CGC^ 
TGQSAVVAEALAVAt,-' 



3 5041 cLclIcc^ 

a cgaagtggccgcgctcacccgggcgttccgcg^ 

tl GATCAAGGCCGTGCTGGCGGTCCG^ 



35101 
3516 
35221 
35281 
35341 
35401 



I K A V LAVREGVlPGTFiii"^ 

WPEADHPRRAov^^ 
CAACGCCCACGTGATCCTGGAACAGGCCCCGCCGGCCGCCCCCC^ 

3546 i cLtgccc^^ 

35521 CCgIcCTGGCCX^^ 

RDLAAWSAr^r- 

CGAGGCCGCGCGCCTGCTGGGCGGCGCGCGCGGCGAGACCGCGCTCCCCGGCAGGGAGG^ 

EAARLLGGARGETAi, 

CGTGTTCCTCTTCCCCGGGCAGGGCACCCTCCCGCCGGACACCGGGCGCGGCCTGTACGC 
VFLFPGQGTLPPDTOK^ 

GGACGTGCCGGCG^CCGCGCCCACTTCGACG^^ 
CACCGACCTC^ 

CCTCTTCGCCGTCGAGTACGCCCTCGCCCGCACCCT^ 
CGCGATGCTCGGGCACAGCCTCGGCGAG^ 

36001 CCTGCCGGACGJX3CTGACGCTCGTCOT 

CGGCCGCATGCTCGC^TCCCGCTCACGCCGGACGACCTGCGCCCGCTGCTGC^ 
GRMLAVPLTPDDLKrij'J 

GGTGGAGTTCAGCGCCTTCAACGCCCCCGGCCGCTGCGTCGTCGGCGGGCCCCCGGAGCC 

VEFSAFNAPGRCVVOw 

GGTGGCGGAGCTGCGCGCCCGGCTGGCGCGGCGCGGAGTG^ 36240 
OGCGCACOCCTTCCACTCG^ 
3630 1 G^CG^ 
36361 OQCOCI^MixWWia^ 



35581 
35641 
35701 
35761 

35821 CA< 

35881 

35941 

36001 

36061 

36121 
36181 
36241 



35040 
35100 
35160 
35220 
35280 
35340 
35400 
35460 
35520 
35580 
35640 
35700 
35760 
35820 
35880 
35940 
36000 
36060 
36120 
36180 



36300 
36360 
36420 



PCTAJSOO/00445 

WO 00/40704 

ADAAVTTPAYWLAHLRRPVR 
3642 1 C^CGCCGAC^ 36480 

n gccgcgg^ 36540 



36481 

36541 «cancc^G«^^ 36S °° 

3660 X ACJCTGGC^CT^ 3 " 6 ° 

3666 x cLl^ 36720 

36721 cILgaacccacggacct^^ 36780 

3678 1 TCCGCCGCTC^ 36840 

368 4X CGCCCTGGCCCGCGACTA^^ 36900 

ALARDYLATGVEASOVi, 

. 369 0X CCACC^CCTG^ 36960 



3 



GGGGACGAT^CCGCGGAGATCACCGCGGCC^CCCGTCC^CTCCGGGCTCGTCGACCT 37020 
GTlAAElTAAHPo 

,21 GCTCCGGCACTGCGCCCAGGGCTATCCGCGC^ 37080 

70 BX CGTCCJCTATCCG^ 37140 



36961 



370 



200 



37X41 CGACCACCGCGCCACCGGC^ 37 
DHRATGRLTKL.*^^ 

37201 GGCCGACCGC^CCGGCCGCC^CTGCG 37260 

37261 CCTCACCCACKCCCT^ 37 32 0 
LTQALVTRAPGRLUX n « 

373 21 CTCCCGGCACTTCGTGACCGCACTCGGCCGGGAGGCCGCCCGGCGCGGCCT 3.380 

srhfvtalgreaa«-^ 

37 38X CCGCGCACGCGTCCTCGAC^ 37440 

RARVLDIAKfr 

3744X GTTCGACGTCGTCTGCGGCCTCGACGTGGTCC^CG^CCACCCCCGACCTGCGCACCA 37500 

FDVVCGLDVVMAi 

37B01 CGGCCATCTGOGC^ 37560 
3756 1 CGAC^ 37620 
3762X CCGGOGCACCCACOG^ " 68 ° 

37S8X gLLgCCACGGCCGATGTGATCGTGC^ 37740 
DFATADVlvrr^w ^ 

3774! GCTCGCCCGGCAGACCCCC^ 37800 
37 801 CGGCACGTGGTGCTACGC^ 37860 

13 • 



WO 00/40704 PCI7US00/00445 



37861 


GACGGGCGGCTGCCTGCTGCTGGGCGACGGGGACACGGCGAAGGCCGTCGCGAGCCGGCT 
TGGCLLLGDGDTAKAVASRb 


37920 


37921 


GGAGGCCCTCGGCGTGCCCGTCACCACCGTCGGCGGCGGCCGACC3CCGGGCCCCGAGCG 
EALGVPVTTVGGGRPPGPER 


37980 


37981 


GTACCGGGAACTCGTCGGCCCCGCCACCCGCCTGGCCGTCGACCTGTCGCCGCTGCGCGA 
YRELVG PATRLAVDLWPLRD 


38040 


38041 


CGCGTCCCACCGCGGCCGCGCrcCCGGCGCCGCCGGCGTACGGACCGCCCAGGACGCCGC 
ASHRGRAAGAAGVRTAQDAA 


38100 


38101 


GCTGCACAACCTGCTCCACCTCGCCCGGGCCTTCGGCGCGCTGGAGGAGCGCCACCCCGC 
LHNLLH LARAFGALEERHPA 


38160 


38161 


CCGCGTCGTGACCGTGACCACCGGTGCCCACGACGTGCTCGGCGACGACCTCGCCCACCC 
RVVTVTTGAHDVLGDDUAHP 


38220 


38221 


CGAGCACGCCACCGTCCCGGCCGCGGCCAAGGTGATCCCCCGGGAGTACCCGTGGATCGC 
EHATVPAAA KVI PREYPW IA 


38280 


38281 


CTG C AC CG CC CTGG ACGTGG AG CCGGG CCTGG ACGCCG AG CGG CTGG CGG ACCTG A rt-b 1 
CTALDVEPGLDAERLADLIV 


38340 


38341 


CCGGG AACT CGG CGCGGCGCGCG AG ACCACCGTCACCGCCTGCCG CGG CCGACGCCGt, 1 1 
RELGAARETTVTACRGRRRt 


38400 


38401 


CACCCCCTGCCCCGTCCGGCAGCCCCTCCCCGCCGCACCGGAACGCCCGGCGGTCCGGCC 
TPCPVRQPLPAAPERPAVRP 


38460 


38461 


CGGCGGCGTCTACCTCGTCTGCGGCGGCCTCGGCGGCATCGGCCTCCACCTCGCCGAGTA 
GGVYLVCGGLGGIGLHLAEY 


38520 


38521 


CCTGGGCCGCGCCCGC^CCACCGTCXSTCCTC^CCCACraGaSGCCCTTTCCCGCCCCCGG 
LGRARTTVVLTHRRPFPAPG 


38580 


38581 


CGCGTGGG ACGGG CTG CCCGCGGG ACACCCGG AGG CGGC CGT CGTCCGG CGGCTG CGCTC 
AWDGLPAGHPEAAVVRRLRS 


38640 


38641 


CCTCGCCGCCACCGGCGCCACGGTCGTCGTCCGCCGGGCCGACCTCACCGACCACGACGC 
LAATGATVVVRRADLTDHDA 


38700 


38701 


GATGCGCGCCCTCGCGGACGAGGTGGAACAGGCCCACGGCCCCGTCCGGGGGGTGGTGCA 
MRALADEVEQAHGPVRGVVH 


38760 


38761 


CGCGGCCGGGGTGCCCGACACCGCCGGCATGATCCAGCGTCGCGACCGAGCCGGCACGGA 
AAGVPDTAGMIQRRDRAGTD 


38820 


38821 


CGCCGCCCTCGCCGCCAAACTGACCGGCACCCTCGTCCTGGACGAGGTGTrCGCCCAt,LO 
AALAAKLTGTLVLDEVFAHR 


38880 


38881 


CG ACCTCG ACTTCCTCGTCCTGTGCTCCTCGATCGGCACCGTGCTGCACAAG C I (jAAoi i 
DLDFLVLCSSIGTVLHKLKF 


38940 


38941 


CGGCGAGGTCGGCTACGTGGCGGGCAACGAGTTCCTCGACGCCTATGCCGCCCAC^t-UL 
GEVGYVAGNEFLDAYAAHRA 


39000 


39001 


GGCCCGCCGCCCCGGCAGAACCCTGTCGATCGCCTGGACCGACTGGCGGGAGTCG^CAi 
ARRPGRTLSIAWTDWRESGM 


39060 


39061 


GTGGGCCGCCGCCCAGOTCCGTCTGACCGAGCGCTACGGCACCGG03CCGACCTGCCCGT 
WAAAQRRLTERYGTGADLPV 


39120 


39121 


ACCGCCCGGGGGCGACCTGCTCGGCGCGATCAGCCCCGAGGAGGGCGTCGACGTCTTCGC 
PPGGDLLGAISPEEGVDVFA 


39180 


39181 


CCGGCTGCTCGCCGCCGACACCGGCCCGAACGTCATCGTGTCGGCCCAGGACCTCGACGA 
RLLAADTGPNVIVSAQDLDE 




39241 


ACTCCTCGCGCGGCACGCGGCGTACACCACCGACGACCACCTCGCCGCCCTCGGCGACCT 
LLARHAAYTTDDHLAALGDL 


39300 


39301 


GAGGATCGCCGCCGCCCGGGACCGCTCCGCGCCCGCCGrcCCGTACGCGGCCCCCCACAC 
RIAAARDRSAPA APYAAPHT 


39360 



WO 00/40704 PCT/USOO/00445 



39420 
39480 



39361 GCCCKCCCAGC^ 

39421 cctcgacgac^^ 

39481 GCAGCTGCGGGACCCCTACG^^ 39540 



39600 



39720 
39780 
39840 



39541 GGTGGCGGCGCTGGCCGCCGCCACCGGCCCGCCGCCGGAAGAGACGCCC^CCAGGAAGA 
39601 GGTGGTGCTGTGACCACGCCCCGCATCACCGACCTGCTCACCGAGCTCCGCGGCCGGCAG 3 9660 

V V L M * T T P R I T D L L T B L R 0 R Q (orf23) 

39661 GTGACCCTCACGGCCGACGGGGACCGGCTGCACTGCCGCGCGCCCCGGGGCGCGCTCACC 
VTLTADGDRLHCRAPRGAi. 

39721 «C«OCK*K^ 

39781 GACCGCCGCATCCCGCGCCAC^ACGGGCCCGCGCCGCTGTCC^CGCCCAGGAACGGCTC 
DRRI pRHDGPAPLSFAQERi. 

39841 TGGCTCCTCCACCAGTTCCACCCG^^ 39900 
39901 CTGCGCGGGCCCCTGAAC^^ 39960 

39961 CACGACGTCCTGCGCACCCIMTACGCCATCAGCCGCGGCCTGCCCCGGCCCGTCGTCGAA 
HDVLRTRYAISRGLPRfvvr. 

40021 CCGGCCCACACGCCGCCGCTGCCCCTGACCGACCTGACC^GCTCCCCGCACACCACCGG 
PAHTPPLP^TDLTGLPAHHR 

40081 GACGCCGAACTCGCCCGGCTOGCCGCCCAGGAGGCCAGGCGGCCCT 

40141 GGCCCGGTGCTGCGGGCCCGGCTCCTCCGAACGGCCCCCGAGGAGCACCGGCTGCTGCTG 
GPVLRARLLRTAPEEHRIibb 

40201 ACCCGCCATCACATCGCCAGCGACGGCrGGTCGCTCGACATCCTGCTCCGCGAACTGGGC 
TRHHIASDGWSUDILLREUO 

40261 ACGTTCTACCGGGCAGGGCGGGACGGCACACCCGCCGGCCTCGACGCCCTGCCGCTGCGG 
TFYRAGRDGTPAGLDAIiriiK 

40321 TACGCCGACTOCGCCGCGTACCAGCGCGAACAGGC^^ 

YADFAAYQREQAERPETAER 

40381 TCGACCCGCTCGGCACGGCACCTGAGGG^ 

40441 CCGCCCGCCGAACCCTCCCACGCGCCGGCCGGCACCGTACGGACGGAC 

PPAEPSHAPAGTVRTDLPAA 

40501 CTCGTCACCGGCCTGCGGCAGCTGGGCGGCCGGGCCCGCACC^CGCTCTTCCCGCTCCTG 
LVTGLRQLGGRARTTLFPLL 

4 0561 CTGAGCGCCTTCGGCCTC^ 

40621 ATCCCCGTCGCCGGCCGGCCGCGCACCGAACTGGAGCCGCTCATCGGCTGCTTCGCGACC 40680 
I pVAGRPRTELEPLIGCFAi 

40681 ATCGCGCCGATGCGGCTGACGAGCGACGGGACCGAGCCGCTGACCCGGCTCGCCGCCCGC 40740 
IAPMRLTSDGTEPIiTRl-AAK 

4074 1 GCCCAGCAGCACGTCCAGGACGCGCTGGACGGACCCGACGTCCCCTTCGAGCGGCTCGTG 
AQQHVQDALDGPDVPFERLV 

■ \s ■ 



40020 
40080 
40140 
40200 
40260 
40320 



40440 
40500 
40560 
40620 



40800 



WO 00/40704 PCT/US00/00445 


40801 


CACGCGCTGCGTCCGGAGCGGGACCTCGCGGAGAACCCCCTGTTCTCGGCGTCGTTCGCC 
HALRPERDLAENPLFSASFA 


a rt a £ n 
4 OooU 


40861 


TTCCAGAACACCCCGCXX3ACCGCCGTGCGCCT 

FQNTPRTAVRLPGLDAEVLP 




40921 


TCGCCGCCCGTGGCCCCCAAGTTCCCGCTGGCCCTCACCGCGACGGCGCGGGCCGACGGC 
SPPVAPKFPLALTATARADG 




40981 


GG AATGGG C CTGG AGCTGG AGTTCGACCGGG ACCG G ATCGCCG AGCCGGTCG CGCGGGGG 
GMGLELEFDRDRIAEPVARG 


4104 0 


41041 


ATCCTCACGTCCTTCCACGCCGCCCTCGCCCGCGCGGTCGCCGACCCCGAGGCCCCGGCG 
ILTSFHAALARAVADPEAPA 


41100 


41101 


GCGCCCGTACCGGCCGCCGCCGTGGACCGGCGGCCCGGGCX3CGAAGGACACGAGTGCCTC 
APVPAAAVDRRPGREGHECL 


*i J. i. D u 


41161 


CACGAGCCGGTGGCGCGGGCGGCGGCACGCCACCCCGACGCCGTCGCCGTCAGCTGCGGC 
HEPVARAAARHPDAVAVSCG 




41221 


GGCACCC AG CTC AGCTACGGGG CGCTCG ACACCCG CGCCG AACGGCTGG CCG CGGTG CTG 
GTQLSYGALDTRAERLAAVL 


A 1 O Q C\ 


41281 


CGCGCCCAC^CGCCGGCCCCGAGCGGCTGGTGGCCCTGTGCCTGCCCACCGGCCCCGAA 
RAHGAG PERLVALCLPTGPE 


41340 


41341 


TGGGTCGTCGGCGCGCTCGCCATCCTCAAGTCCGGCGCCGCCTACCTGCCGCTCGACCCC 
WVVGALAILKSGAAYLPLDP 


414 00 


41401 


GG CG ACC CGG CCG AG CG C CGCGC CT CCGTCG CCGCCG ACGCGGG AGCG ACGCTG ATCGTC 
GDPAERRASVAADAGA TLIV 


414 6 0 


41461 


TCCGACACCGCGCTTCCCCCGCTCCACCGCGTCGACGTCACGGCCACCCTCCCGGACGGC 
SDTAL PPLHRVDVTATLPDG 


41520 


41521 


GCCCCCGAGCCCACCGCCCGGGCCGTCCTGCCCGGCAACCTCGCCTACGCCGTCTACACC 
APEPTARAVLPGNLAYAVYT 


/II CQft 

4 IdoU 


41581 


TCCGGCTCCACCGGCGGCCCCAAGGGCGTGCTCGTCACCCATGCCAACGTCACCGGGCTC 
SGSTGGPKGVLVTHANVTGL 


A T C A f\ 


41641 


CTGGCCGCGTGCCGTGAGGCCCTGCCCGCCCTGGACGCCCCCCGGACCTGGTCGGCGACC 
LAACREALPALDAPRTWSAT 


A 1 7f)0 


41701 


CACTCGCCGGCCTTCGACTTCTCCGTCTGGGAGGTCTGGGGCCCGCTGACCGCCGGCGGA 
HSPAFDFSVWEVWGPLTAGG 


4176 0 


41761 


CGCCTCGTCCTCGTGCCCCCGGACGTGGCCCGGGCCCCGGACGAACTGTGGGACACCCTC 
RLVLV pPDVARAPDELWDTL 


41820 


41821 


CGCGACGAACAGGTCGAAGTCCTCAGCCAGACCCCCAGCGCGTTCCACCACCTCCTGCCC 
RDEQVEVLSQTPSAFHHLLP 




41881 


ACCGCCGTGCGCCGGGCGGCCCAGGCCACCGCGCTCGAACTCGTCGTCCTGGGCGGCGAG 
TAVRRAAQATALELVVLGGE 


4 194 0 


41941 


GCGTGCGAGCCCGCCCGTCTGACGCCTTGGTGGGACGCCCTGGGCGACCGGCGCCCGGCC 
ACEPARLTPWWDALGDRRPA 




42001 


GTGGTCAACATGTACGGCATCACCGAGAACACCATCCACGTCACCGTCCGCCGGATGACG 
VVNMYGITENTIHVTVRRMT 




42061 


GCGGCGGACCGGTCGGGCAGTCCCGTCGGCCGGCCGCTGCCGGGGCAGCGCGCCGACCTT 
AADRSGS PVGRPLPGQRADL 




42121 


CTCGACCCCCACGGCCGGCCCGTCGCGCCGGGCGGGCGGGGCGAACTGTTCGTCGGCGGC 
LDPHGRPVAPGGRGELFVGG 


42180 


42181 


GTCGGACTGGCCCGCGGCTACCTCGGCCGGCCCGGCCTCACCGCCCGGAGCTTCCTGCCG 
VGLARGYLGRPGLTARSFLP 


42240 


42241 


GACX3ACACCCCCGGCTGGCCGGGCGCGCGCCGCTACCGCTCCGGAGACCTGGCCCGGCTG 
DDTPGWPGARRYRSGDLARL 


42300 



WO 00/40704 PCT/US00/00445 



42361 
42421 
42481 



423 01 CTGCCCGACGGCGGCCTGGA^ 

LPDGGLDYAGRSU^v 

TACCGCGTCGAGCCCGCCGAGA^ 

YRVEPAETEAAALTHPAvn 

TGCGTGGTCGTGCCACGCGGCGACGGCGACCGGCGCCATCTCGCGGCGTACGTCGTCGCC 

CVVVPRGDGDRRHbAAi 

GACACCCGCGCC^CGACC^ 

DTRACDGPGLRl"^ 

42 S<1 CACCTGGTGCCGGCCTCGG*^^^ 

HLVPASVVFLKKir" 

426 oi crcwceioooQ^ 

42661 fc «5«^ 

4272 1 GAGACCATCGGGACGCACGACAACCTCTTCGACCTGGGCGGCGACTCCCTGACGGTCACC 

ETlGTHDNLFDLGOuai, 

42781 CAGTTCCACTCCCGGGTGGTCGAGGAGnCGCCGTGGACCT 

42841 Lggccct^ 
429 oi cgcaccgcggtactgcgc^ 

4296 i ggggagtccggcggtaatccggaggagtccgccgctacggcgcgggggcccgccgtcg 

GESGGNPEESAATAK^f^ 

43021 GCGAACGAACCCGGCGCTGCGGCGCGTGAGTCCGGC^ 

ANEPGAAARESOArt*-* 

4 3 081 GCAGTACAGGAGTCCGCCGCTACGAAGGGGGAGCCCGGCACCGC^ 

43141 ^GAG^^^ 

4320! GCCGCCACACCGCAG^ 

43261 GAATGAGCCGGCCGGCCGGCATCGTCGACATCGCGCGCCGTCACGCCGAGCGCACCC^ 43320^ 

MSRPaGIV 
E * t 

43321 cccgtcccgcgtacgcg^cctgcccgacggcgaga^ 
433B1 ccgacatcgaccggcgggcccgcgccgtg 

43441 gggagcgggtcctggtcgcctatccctccgggcccgagtacgtccaggcgttcctgg^ct 

ERVLVAYPSGrti v 
435 01 GCCTGTACGCGGGCGTGgW^ 

43 561 AACGGCTCGCCGGGATCCGCGCCGACGCCCG^ 

43621 CCGAGGCCGGGCTCGCCGGCCTGGCCAC^ 



42360 
42420 
42480 
42540 
42600 
42660 

42720 

r 

42780 

42840 

42900 

42960 

43020 

43080 

43140 

43200 

43260 



EAGLAGLATL 



43681 CCGGGGCCTGGACCGACCCCXSTCGCGGGACCGGACG^ 

GA WTDPVAGPDALAFLQXi=> 



43380 
43440 
43500 
43560 
43620 
43680 
43740 



WO 00/40704 PCT/USOO/00445 



43741 


CCGG ATCG AC CCG C CGCCCCCGCGG CGTCATGGTCGG CC ACGGCAATCTG CTGG CCAACG 
GSTRRPRGVMVGHGNLLANfc 


43800 


43801 


AGCGCTGCATCGCCGCCGCCTGCGGCCACGACCGGGACTCCACCp'CGTGGGATGGGCGC 
RCIAAACGHDRDSTFVGWAP 


43860 


43861 


CGTTCTTCCACGACATGGGCCTGGTCGCCAACCTCCTCCAGCCCCTCTACCTCGGGTCCC 
FFHDMGLVANLLQPLYLGSL 


43920 


43921 


TGTCGGTGCTGATGCCGCCGATGGCCTTCCTCCAGCGCCCGGCC^CTCGCTGCGGGCCG 
SVLMPPMAFLQRPARWLRAV 


43980 


43981 


TCTCCCGCTACCGGGCGCACACCAGCGGCGGCCCCAACTTCGCCTACGACCTGTCTGTCG 
SRYRAHTSGGPNFAYDLCVD 


4404 0 


44041 


ACCGGGT CGG CG AGG ACGAG CGGG CCGG ACTGGACCTGTCG^CTGGAAGGT CG CCTACA 
RVGEDERAGLDLSGWKVAYN 


44100 


44101 


ACGGCGCGGAACCTGTACGGGCCGACACCCTGCGACGGTTCACCGACCGCTTCGCCCCCC 
GAEPVRADTLRRFTDRFAPH 


44160 


44161 


ACGGCTTCACCCCCGGCGCGCACTTCCCGACCTACGGGCTCGCCGAGGCGACCCTGCT1LO 
GFTPGAHFPTYGLAEATLLV 


44220 


44221 


TCGCCACCGGCCCCAAGGGAGTGCCGCCCCGCACCCTGACCGCCGACCGCGCCGCCCIU^ 
ATGPKGVPPRTLTADRAALR 


44280 


44281 


GCGCCGGCCGGCTCCGGCCCGCCGGGCCCGGCGAGGCCGGCCTGGAACTGGTCGGCAAC^ 
AGRLRPAGPGEAGLELVGNG 


44340 


44341 


GCACCGCCGGCCTCGACACCACCCTCCGGATCGTCGACCCCGCGACCGCGCGGGAGTGCC 
TAGLDTTLRIVDPATARECP 


44400 


44401 


CGCCCGGAGAGGTCGGCGAGGTCTGGGTGCGCGGCCCGGGCGTGGCACGCGGCTACTTCG 
PGEVGEVWVRGPGVARGYFG 


444 60 


44461 


GCCGCCCGCGCGAGTCCGCGCCGCTGCTCGCCGCCCGCCTGCCCGGCGGCGAAGGACCGT 
RPRESAPLLAARLPGGEGPY 


44520 


44521 


ACCTGCGGACCGGGGACCTGGGCGCCCTGCACGACGGGGAACTCTTCCTCACCGGACGCC 
LRTGDLGALHDGELFLTGRH 


44580 


44581 


ACAAGGACCTCATCGTCATCCGCGGCCAGAACCACCACCCGCACGACCTCGAACGGACCG 
KDLIVIRGQNHHPHDLERTA 


44640 


44641 


* _MMMMJ-tMM»T«TI/ , ^/*'/^/^/™"T'0 f^f^C^CLClflTZ. 

CCGAGCAGGCCCACCCGGCGCTCCGCCCGACCTGCGCCGCCGCGTTCGCGGTGCCCGGGG 
EQAHPALRPTCAAAFAVPGD. 


44700 


44701 


ACGG CG CGG AG CGG CTCGTG CT CGT CTG CG AACTC ACCTCCT AC CG CGC CGTCG AC CCGG 
GAERLVLVCELTSYRAVDPA 


44760 


44761 


" — — _MMM»*»MMMMMMMMMMM* MM M ftf^l'T' ff^ f* Cf^Cl A 

CCGCCGTCGCCGAGGCCGTCCGGGCCGCGCTCGCCGCGCGGCACGGCGTCGCCCCOLA^A 
AVAEAVRAALAARHGVAPHT 


44820 


44821 


CGCTGGTGGTGCTGCGCCGCGGCGGCATCCCC^AGACCACCAGCGGAAAGGTGCGOLbt-U 
LVVLRRGGIPKTTSGKVRRG 


44880 


44881 


GCCACTGCCGGACGGCCTACCTCGACGGAACGCTCCCCGTTCACACGGCCGTCCGCCTCQ, 
HCRTAYLDGTLPVHTAVRLP 


44940 


44941 


CGGCGGGGGAGGAGGGCACCGAGGCCCTTCCCCTGACCACGGACCCCGGTCGGCTGGCCA 
AGEEGTEALPLTTDPGRLAT 


45000 


45001 


CGGCGCTGCGCGACCTGGCCGCCGCCCACGCGGGCCTGGCCGGGCCCCTCCCCGGCACCG 
ALRDLAAAHAGLAGPLPGTD 


45060 


45061 


ACGAGCCGGTGAGCGCCCTCGGCCTGGACTCGCTCGCCTCCCTGCGGCTCCACCACCACG 
EPVSALGLDSLASLRLHHHV 




45121 


TCCAGTCCGCCTACGGCGTGACCCrTGCCCGTCACCXSCCCTGCTCGGCGACACCACTTACC 
QSAYGVTLPVTALLGDTTYR 


45180 


45181 


rLAELTLAA pRPARAPEGQV 


45240 
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.5241 TCACCGGCGTCTGGC^CCGrTCACGCACGGG^GCGCGCC^GTAC^^GGC^C 45300 



45360 



4530! TCGCCCCGCACGCGGCCGC^^ 

APHAAAYHLVRALALRGPVU 

45361 ACGAGGAGGCCCTCGCCGAGG^ 45420 
EEALAEAVRRVVRRHPALKi 

45421 CCCGCTTCGOTCT^ 45480 
RFALRDGEPARRTEPYGPEL 

45481 TGGACGTACGCGACGCCA^ 45540 
DVRDATGLPADRLREHLAAA 

45541 CC^CGACCG^^ 45600 

45601 GCACGGACGGCGGCCACATCCTGCTGCTGGTC 4 5660 
TDGGHI LLLVAHHLVADFWb 

45661 CCCTCGTCGTCCTCCTGGGCGACCTCGCCCGGGCCCACGCGGGCGAGGACCTGCCGCCCG 45720 
LVVLLGDLARAHAGEDLPfA 

45721 CGCCGGAGGGGGACCCCGGCGACGAGGCGACGGACGCGGACCGGACGTACTCGCGG^CC 45780 
PEGDPGDEATDADRTYWRHR 

45781 GGCTCGCCGACGCGCCACCCGCCCTCGACCTGCCCACCGACCTCCCCCACCCCGCCGAGC 45840 
LADAPP ALDLPTDLPHPAER 

45841 GCGGCTTCGCCGGCGCCACCCACGCCTTCCGGCTGCCCCCGGACCTCACCGCCCGGCTGA 45900 
GFAGATHAFRLPPDLTARLT 

45901 CCGCCCTCTCCCGGGAACGGCACTGCACCCTCTTCACCACCCTCCT^ 45960 
ALSRERHCTLFTTLLAAHQL 

45961 TACTGCTCCACCGCCTGACCC^GCAGGACGACCTCGTCGTGGGC^CCCTCCTCGCCCGCC 46020 
LLHRLTGQDDLVVGTLLARR 

46021 GCGACACCGCCGAAGCGGCCGGCGCCGTCGGCTACCTG 46080 
DTAEAAGAVGYLVNPLPLRS 

46081 CCGTACG GGAGC CGGGGG AG ACCTT CACGG AACTGCTG CGC CGCACCCGG CGG ACCGTG C 46140 
VREPGETFTELLRRTRRl vi, 

46141 TGGACGCGGTCGCGCACGGCCGC^^ 46200 
DAVAHGRHPFGPLVSRLAPA 

46201 CGCGCACGCCCGGCCGCGCGCCGCTCCTGCAGAGCCTGTTCGTGCTCCAGCGCGAGTACG 46260 
RTPGRAPLLQSLFVLQREYG 

46261 GCG ACGAGGCGG ACGGGT ACCG CG CG CTCG CCCTGGGCGTCGGCGGCCGG CTGCGCGTCG 
DEADGYRALALGVGGRLRVG 

46321 gcggactcgAcctcmaggcactcgcgttgccgcgccgctggtcgcagctcgacctctcgc 

GLDLEALALPRRWSQLDLSL 

46381 tgagcatggcgcggctcggggacgggctgAcgggggtgtgggagtaccgcaccg^^ 

smarlgdgltgvweyrtdlf 

46441 tcaccgaggccacggtcgcggagctgagcgaggcgttcgtccacctgctgcgggcggccg 
teatvaelseafvhllraav 

46501 TCGAGGACCCGGGCGCGCCOTTGGAGACX3CTGCCGCTCACCGGCGGCCGGGAGACCG 

edpgapvetlpltggretgp 
4 6561 cgcgccgcggcccgtcggcggcccggcccgccctcccgctgcaccggctcgtggccgcgg 

RRGPSAARPALPLHRLVAAA 

46621 CGGCG CGCCGCG ATCC CG CACGG ACGGCGGTCGTCG CACT CG CCCCGG ACGG CACCGCC C 
ARRDPARTAVVALAPDGTAH 

46681 ACCAC^TCAGCCACCK3AGCCCrGCACCGCGCGGCC^CCACCCTCGCCGCCCGGCTCCGCC 



46320 



46380 



46440 



46500 



46560 



46620 



46680 



46740 
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HISHGALHRAATTLAARLRR 

46741 GGGAGGGCGCCGGCCCGGAGCGGCCCGTCGCCGTGCTCGTCGAGCGG^CCCCTCGCTGC 46800 
EGAGPERPVAVLV ERGPWi,r 

46801 CCGTCGCCTACCTCXjGCATCCTGCACGCCGGGGCCACCGTCCTGCCCCTGGAC^ 46860 

6861 ACCCCCCGC^CAGGCTCGCCCGGACGArCGCGAACTCGGGGGCGCC^CTGCTGCTCACCG 46920 
ppHRLARTIANSGARLLLTE 

46921 AGACCGGGACCGCCTCGCGCGCGGCCC*GGCGGC^ 

TGTASRAAEA-AGPGVRAiii 

46981 TGCGTGAGGGTGCCACCGGCGGCGAGCGG^CTCGGCGGACGTCCACCCCGAGCAGTCCG 4704 

REGATGGERFSADVHPby&A 



4 



0 



0 



47041 CGTACCTGCTGTACACCTCCGGGTCGACGGGCGACCC^ ^7100 
YLLYTSGSTGDPKGVLVPHR 

47101 GGGCCATCGTCAACCGCCTCCTGTGGATG^GGAGACCTACCGGCTGCGCCCGGGGGAGC 4 7160 
AIVNRLLWMQETYRLRPGER 

47161 GGGTCCTGCAC^GACGCCGGTGACGTTCGACGTCTCGATGTGGGAGCTGCTOTGGCCGC 47220 
VLHKTPVTFDVSMWELLWPL 

47221 TGACCGCCGGGGCGACCGTCGTCATGGCCCGGCCCGGGACCCACCGCGACCCCGCGCGAC 47280 
TAGATVVMARPGTHRDP A R L 

47281 TCGTC CGG CGG ATCGC C CGCG AGGCCGTC ACC AC CGTGC ACTTCGTCCCCTCG ATGCTC A 47340 
VRRIAREAVTTVHFVPSMLT 

47341 CCCCGTTCCTCACCGAGCTCGCCCGCGGCACGACGCGGCTGCCCGCGCTG^ 47400 
PFLTELARGTTRLPALRRVV 

47401 TGTGCAGCGGGGAAGAGCTGCCCGCGGCCGCGGTGAACCGCGCCGCCGGACTCCTCGACG 47460 
CSGEELPAAAVNRAAGLLDA 

47461 CCCGGCTGTACAACCTCTACGGCCCGACCGAAGCCGCCGTCGACGTCACCGCCTGGCCCT 47520 
RLYNLYGPTEAAVDVTAWPL. 

47521 GCCGCCCGCCCGAGCCGGGGCCGGTGCCGATCGGCCTGCCCATCGCC^CACCACC^ 47580 
RPPEPGPVPIGLPIANTTI b 

47581 AGGTCCTCGACGGCCGGCTGCGCCCGCTGCCCCGCCCGGTGCCCGGCGAGCTGTACCrrGG 47640 

VLDGRLRPLPRP vPGELYL,Cj 

47641 GCGGCGCCTGCCTGGCCCATGGCTACCAC^CGACCCGGCCCTGACCGCCGCGCGCTTCC 47700 
GACLAHGYHHDPALTAARFL 

47701 TTCCGGCCCCCGGCGGCGGGCGCCGCTACCGCACCGGGGACCTCGTCCGCCAACGGGCCG 47760 
PAPGGGRRYRTGDLVRQRAU 

47761 ACGGGGCACTGGTGTTCCG^GACGCACGGACGACCAGGTGAAGATCGGCGGCATCCGGG 47820 
GALVFRGRTDDQVKIGGIKV 

47821 TCGAGCCCG^GAGGTGGCGGAGGCGC^^ 47880 
EPGEVAEALRALPGVADAAV 

47881 TCGTCCCG CACG ACGGGCGGCTGGCGG CGTACG CGGTCGCCG ACCCGGT CGG CC CGG C CC 47940 
VPHDGRLAAYAVADPVGPAP 

47941 CGGCGGCGgAcGCCCTGCGGGACGCGCTGCGCAGGCGGCTGCCCGGCCACCTGGTGCCCG 48000 
AADALRDALRRRL pghlvpa 

48001 CCGCCCTCACCCTGCTGGACCGGC^ 48060 
ALTLLDRLPLTPAGKLDRRA 

48061 CGCTGCCCCACCCGTCGGcicCGCCCCC(^ACGGCGGAC^CCGCCCACGACCGGGACCG 48120 
LPHPSAPPPDGGRPPTTGIfc 

4 8121 AACGGCTCGTCGCCCGGGTGTGGGCCGAACGCCTCGGACGGGAAGTCGTCGGCGTGGACC 4 8180 
RLVARVWAERLGREVVGVDR 

' DO ' 
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4 8181 GGGACTTCTTCTCCCTGGGCGGCGACTCCGTCCGGGCCCTCGGCGTGACGGCGGCCCTGC 
DFFSLGGDSVRALGVTAALR 

4 8241 GCGCCX3CCGGGCTCCCGGTGACCX3TCACCGACCTCCrrca;CC^ 

AAGLPVTVTDLLRLPTVAAL 

4 8301 TCG CCCG CC ACG C CG ACG AG CGGG CGG ATCG CCG ACCGG CG CG AC AGG AGACGCCCCCCG 
ARHADERADRRPARQETPPG 

4 8361 GGCCGTTCGCCCTCTGCCCGGAAGCCGCCGGCGTGCCCGGCCTGGAGGACGCCTACCCGA 
PFALCPEAAGVPGLEDAYPM 

4 8421 TGT CG ATGG CC C AGCGGG CCGTGCTCTTCCACCGTG AC CACAACC CCGG CTACG AGGTCT 
SMAQRAVLFHRDHNPGYEVY 

4 8481 ACGTC^CCAGCGTCGCCGTCTCCACGCCCCTCGACCGC^C^CGGCTC^CCGCGGCCGTGG 
VTSVAVSTPLDRTRLAAAVD 

4 8541 ACCGGCTGCTGGACCGGCACGCCTATCTGCGGTCCTCCTTCGACCTCGTGTCCCACCCGG 
RLLDRHAYLRSSFDLVSHPE 

4 8601 AGCCCACCCAGCTCGTCTGGACCCACCTGCCCACCCCGCTCGAGGTGGTGGAGTCGTCCG 
PTQLVWTHLPTPLEVVESSD 

4 8 661 ACCCCG CCGGTTTCG ACGCGTGG CTGCACGCCG AACGC AAG CGC C CCCTCG ACGTCGG C A 
PAGFDAWLHAERKRPLDVGT 

48721 CCGGACCGCTGGCCCGGTTCACCGCGCACGACGCGGGAGCCGCCGGATTCCGGCTGACCG 
GPLARFTAHDAGAAGFRLTV 

4 8781 TCAGCAGCTTCGCCCTCGACGGCTGGTGCGTGGCCACCGTGCTCACCGAACTGCTCCGCG 
SSFALDGWCVATVLTELLRD 

4 8841 ACTACTGGTCCGCGCTGCGCGGCGCGCCCCTCAGCCTCCCGGCACCCGCCGCCTCCTACC 
YWSALRGAPLSLPAPAASYR 

4 8 901 GCGAGTTCGTCGCCCTCGAACGCGCCGCCCAACACGATCCGGCGCACCGGGAGTTCTGGC 
EFVALERAAQHDPAHREFWR 

4 8961 GGACGGAGCTCGCCGGTGCCCGGCCGCATCCGCTGCCCCGCCGCCCGGTGCCACCGCCCG 
TELAGARPHPLPRRPVPPPG 

4 9021 GGCCGGACGGGATCCGCCAGCACCGTCACGTCGTCCCCGTCGAGGACACCGTCGCCAAGG 
PDGIRQHRHVVPVEDTVAKG 

49081 GCCTGTCGGCGCTCGCCGGCGAGCTGGGTGTCGGGCTCAAACACGTTCTGCTCGGCGTCC 
LSALAGELGVGLKHVLLGVH 

4 9141 ACCTGCGGGTCGTCCGGGCCCTGTCCGGCGACCCCGACGTCATCACGGCCGTGGAGACCC 
LRVVRALSGDPDV- ITAVETH 

4 92 01 ACGGC CG CCTCGAACGGC ACG ACGGCGACCG CGTCCTCGGGGTGTTCAACAACATCCTG C 
GRLERHDGDRVLGVFNNILP 

492 61 CGCTGCGGCAG CGGGTGG ACGGCGGG AG CTGGG CCG ACCTGGCCCG CG CCGCGC ACG CCG 

LRQRVDGGSWADLARAAHAA 

4 9321 CGGAGGCGCGGACGGGGGAGTACCGCCGCTATCCGCTGGCCCAGGCACAGCGCGACCACG 
EARTGEYRRYPLAQAQRDHG 

493 81 GCGCGGCCG^CTCTTCGACACCCTCTTCGTGTTCACCC^CTTCCACCTCTACCGCGCGC 

AAGLFDTLFVFTHFH LYRAL 

4 9441 TGGCCGACCTGGACGGCATGGCGGTCTCCGACCTGCGGGCCCCCGACCAGACCTACGTAC 
ADLDGMAVSDLRAPDQTYVP 

4 9501 CGC TCAC CG C CCACTTC AACGTCG ACGCC ACGG ACGG CGGCGG CCTG CGGCTGCTGCTGG 
LTAHFNVDATDGGGLRLLLE 

4 9561 AGTCGGACCCGCGGGAGTTCCCCGACGAGCAGGTCGCGGAGTTCGCCGCGTACTACCGCC 
SDPREFPDEQVAEFAAYYRR 

4 9621 GCGCGCTGCGGGCCGCCG CCG ACGCCCCGCACCGGCCGTACCGGGACACGCCGTTG ACGG 
ALRAAADAPHRPYRDTPLTD 



48240 



48300 



48360 



48420 



48480 



48540 



48600 



48660 



48720 



48780 



48840 



48900 



48960 



49020 



49080 



49140 



49200 



49260 



49320 
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49440 



49500 



49560 
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! CCCCGGC^ 

1 *cLcg^ 

. GGCCGGACA^ 

,«i tggJccctccacgcc^a^ 
9981 ggLg^^ 

X ACACCCCGCTC^GGCCTGCG^ 

tcJcccccJtg™ 

CGATCGGCCACTACCG^^ 

tcgacL^ 

CCGACTCCGGCGACCCGGGTGCCCTCGGCGCG^ 
TCAAGATCACCCCGGCCCATCTGGCCGCCCTCGCCCAC^^ 
TGCGCACCGTCGTGGCCGGGGGCGAACCCCT 
CCTTCGCGCCCGGCGCCCGGCTCGTCAAC^ 

GCTGTGCCCACGACGTCGCACCGGACCCCGGCGAGGCGCCCATCCCCGTCGGTACCCCGA 

CAHDVAPDPGEAFif 

TCGCGGGCCTCAGCGCGTGCGTCGTCGACGACGOT 

ccgag!^ 

CCGCCGCCGCCTACGTGCCGGACCC^^ 

AAAYVPDPAAPGARKXKi 

TGAAGATCCGCGGCCACCGGGTGGAACCGGGGGAGGTCGAGCAGGTGCTCGGCGGCCACC 

KI RGHRVEPGEVbyvjj 

CC^GGTG^ 

TGCTCGCCGACCGGCTGCCGCCGTACGCGGTCCCCG^ 

LADRLPPYAVPAELVRb^M 

TGCCCACCACCCCCAACGGCAAGGTCGACCACACCCGGC^CCCGC^ 



4968 
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50161 
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PTTPNGKVDHTRLPAAGRDR 



51181 GGCGACTGGCGGAACTGCTCGACCGGATCGAGG CACTGTCCGACG CCGAGG CGGCCTCGG 
RLAELLDRIEALS .DAEAASA 

51241 CACTGCGCGACAGCCGGCCCGCACCCGGGAGTGGCGATGACCGAGCATGACGACCACCCG 
LRDSRPAPGSGDDRA* 

51301 CCGGCCCGCCGGGGCCCCGCCGGTTCCGC^ 

51361 GTGCCGGTGCCCGGGCATGACX3ACCGCGTCGGACGGCTGCCGGCGGACCGGAGCGTCCCG 
51421 CCGACCCGCCGATTCTCTGGGGACCCCGCCGGl^CCGGTGGTGGCCCGCCCG 

51481 CCGGAGGTGCCGATGCGCGGGCATGACGACCGCGTCGGACGGCTGTCGGCGGACTGGAGC 51540 

MRGHDDRVGRLSADWS (orf21) 

51541 GTCCCGCCGACCCGCCTGCCCGCCXSGGGACCCGGCCGGTTCCGTCGGCCCCGGCGGAGGC 51600 
VPPTRLPAGDPAGSVGPGGG 



52321 GCGGCGGGCTTCGACCTCGCCCGCGCCCCGCTGATGCGGCTGACGCTCTTCCGCGAGGGC 
AAGFDLARAPLMRLTLFREG 

52381 GAGC ACG CGT ACCG CTG CGTGTGG ACCC AC CACCACCTCGTCCTCG ACGG CTGGTC CCAG 
EHAYRCVWTHHHLVLDGWSQ 

524 41 CAGCTCGTCCTGCGCGACGTCCTCGACTGCTACATGCGCCTGCGCGCCGGACGCGGCGCC 
QLVLRDVLDCYMRLRAGRGA 

52501 GAG CCGCCCG C CCGG CCGTCCTTCACCGGTCATCTGCG C CGG CTGG AG CGG C AGG ACGGG 
EPPARPSFTGHLRRLERQDG 

52561 ATCGACGAGGAGTTCTGGCGCGACCACCTCGGCGGCCTGCCCGCACCCTCCCGCGTCGCC 
I DE E FWRDHLGGLPA PS RVA 

52621 GGTCCCGG CTG CCGCG ACGGCCGGGTGGTCGCCGTACGGCGCGCCGAGCACCGGCACCGG 
GPGCRDGRVVAVRRAEHRHR 



51240 

51300 

51360 
51420 
51480 



51601 CCGCCCGTCCCGCACGAGGAGGTG ACG ATGTCGGAGTATG ACG ACCG CCTCGCGCGGCTG 
PPV PHEEVTMSEYDDRLARL 

51661 T CGG ACAAC CAG CGCGCCCTG CTGGACCG CTGG CTCG CCGAGG AC CCCG CCGGCGGTG C C 
SDNQRALLDRWLAEDPAGGA 

51721 GGCCCGCTTCGCCCCGACGGCCGCCCGCCCCGCACCGAGGCCGAGCGGATCCTGGCCGGG 
GPLR PDGRP PRTEAERI L AG 

51781 G TCTGGG AGG AG GTGCTGG AG AC CGG CGGG ATCGG CG CCG ACG ACG ACT ACTTCGCGCT C 5184 0 
VWEEVLETGGIGADDDYFAL 



51660 
51720 
51780 



51900 
51960 
52020 
52080 



51841 GGCGGAGACTCCGTCCACGCCATCGTCATCGTGGCGAAGGCCCGGCAGGCCGGACTCGCC 
GGDSVHAIVIVAKARQAGLA 

51901 CTG AC CG C CC ATG ACCTCTTCG AGG CC AGG ACCCTCGCGGC CGTGG CGCGG AG AG CCG C C 
LTAHDLFEARTL AAVARRAA 

51961 CCGG C CGG CCCCG CCGAG CCCGTCCCCG ACG CGG GCGGCGG CG CGGT CCGGT AC CCGCTG 
PAGPAEPVPDAGGGAVRYPL 

52021 ACCC CT ATG CAG CAGGG CATGCTCTACCACTCGG CCGG CGGC AGCACGCC CGGCGCCT AC 
TPMQQGMLYHSAGGSTPGAY 

52081 GTGGTG CAGGTGTGCTG CCGGCTG ACGGGGG ACCTCG ACGTGG CCG C CTT C CGC ACCG C C 52140 
VVQVCCRLTGDLDVAAFRTA 

52141 TGG C AGG C CGTGCTGTCCGCCAACCCGG CG CTGGCCGTCTCCTTCC ACTGGTCCGACGG C 52200 
WQAVLSANPALAVSFHWSDG 

52201 TCCCCGCCCGAGCAGGTGGTGGACCCCGACGCGCGCGTCACCGTCGACACGGCCGACTGG 52260 
SPPEQVVDPDARVTVD TADW 

52261 CGGGACCGCACCCCGGCCGAGCGGGACGATGCCTTCGCCCGCTTCCTGGACACCGACCGC 52320 
RDRTPAERDDAFARFLDTDR 



52380 



52440 



52500 



52560 



52620 



52680 
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LARSDVAEPVNIGSEERVDI 




54181 


TCGCGTCGCTCGTCGAGCGGATCGCCGGGGTCGCCGGGAAGAAGGTGCGCTGCOCC i i c _ 
A S LVER IAGVAGK KVRCA FA 


54240 


54241 


CCCCOSACCGCCCOT^ 

PDRPVGPRGRVSDNTRCKr-i- 


54300 


54301 


TG CTCGGCTGGG C ACCGGAG ACGTC CCTCG CGG CCGGC CTGG AGCGC ACCTACCCGTGG A 
LGWAPETSLAAGLERTYPWi 


54360 


54361 


TCGAGCGCCAGGTCCTCGCCGAGGCCXK3GAGGGCCGATGCCTGAGCACCGCACACCGGTG 

M 

ERQVLAEAGRADA* 


54420 
(orf 19) 


54421 


AAGGACCTCGGCCGGCTGCTGCTCGGGCACGCCGCGCGCTTCCGGGGCCGCGAGCTG(-A^ 
KDLGRLLLGHAARFRGRELQ 


544 80 


54481 


G ACGTCGCC ACCCGGGCG CTG CGGG CCTC CGGCGGGG AG AACG CCTGGGTGGTGTC CG^ _ 
DVATRALRASGGENAWVVSV 


5454 0 


54541 


GTCAACACCAGTCTCCGCGCCCGCCAGGCCGTGGACCACGCGCTGCGGCTCGCCCCCCGL 
VNTS LRARQAVDHALRLAPR 


54600 


54601 


CGCGGGCTCTCCCGGCTGCGCTACCCGTTCTCCGCCGCCCACCACACGGCCACCCCGCCL 
RGLSRLRYPFSAAHHTATPP 


54660 


54661 


— ^ _____ __ __ — _ __ — — pBrtrtn/1» ft /~ , f"TT*f > ft ft — — — ' 1 " 1 1 f~ ™ 'Ff 

CG G AC CCTGTCG CTG CTGTG C CCG AC CCG CG AACG CGTCGG CAACGTCG AACG C _ 1 l _ 
R T L S LLCPTRERVGNVERFL 


5472 0 


54721 


GACAGCGTCGCCCGCACCGCCGCCGCGCCCGGCCGGATAGAGGCCCTCTTCTACG _ ,_AL 
DSVARTAAAPGRIEALFYVD 


5478 0 


54781 


GACGACGACCCCCAACTCCCTGCCTACCACGAGCTGTTCGAGCACGCCCGGTGGCGCTAL 
DDDPQLPAYHELFEHARWRY 


5484 0 


54841 


GG A CGG AT CGG CCGGTG CG CCCTG C ACGTCGGCG CCC CCGTCGGCGTAC C CC ACG C CTGG 
GRIGRCALHVGAPVGVPHAW 




54901 


AACCACCTGGCCCGGAACGCGGCCGGCGACGTGCTGATGATGGCCAACGACGACCAGCTC 
NHLARNAAGDVLMMANDDQL 


_>*_ i* D u 


54961 


TACATCGACTACGGCTGGGACACCGCCCTCGACGCCCGCGTCACCGAACTGAGCGCCC 1 t_ 
YIDYGWDTALDARVTELSAL 


55020 


55021 


CACCCCGACGGCGTCCTGTGCCTGTACTTCGACGACGGCCAGTACCCCGAGGGCGGL I 
HPDGVLCLYFDDGQYPEGGC 


55080 


55081 


GACTTCCCGATGGTGACACGGCCCTGGTACGGCACCCTCGGCITACTTCACCCCGACGAlu 
DFPMVTRPWYGTLGYFTPTI 


55140 


55141 


TTCCAGCAGTGGGAGGTCGAGAAGTGGGTCTTCGACATCGCCGACCGGCTGCACCU-L. l _ 
FQQWEVEKWVFDIADRLHRL 


55200 


55201 


TACCCCGTCCCCGGCGTCCTCGTCGAACACCGGC^CTACCAGGACTACAAGGCACCC- l — 
YPVPGVLVEHRHYQDYKAPF 


55260 


55261 


GACGCCACCTACCAGCGGCACCGGATGACACGGGAGAAGTCCTTCGCCGACCACGC__n_ 
DATYQRHRMTREKSFADHAL 


5532 0 


55321 


TTCCTGCGCACCGAGCCGGACCGCGAGGCGGAGACGGACAGGCTGCGGGCCGTCATCGLC 
FLRTEPDREAETDRLRAVIA 


55380 


55381 


______ ________ .-_—._ _« __. _->.«—>—- m n f— _^i_^j^<~n^ni^ tv tv f*f*f*f^f^ T-^I-l^-^ 

CGGGCACK3GAAC^CCCCGGACGCCGACCACGCCGACCATGCCGTTCATOACGCGGAGA,_ 
RAGNTPDADHADHAVHDAET 


5 544 0 


55441 


TTCTGGTTCACCGGCCTCCTGCGrcAGTCCCACGCC^AAGCTC 

FWFTGLLRESHAKLLAEL.UU 


55500 


55501 


GCGCCG<,GCCCGGCCGCCGGAGCCGTGCTCTTCGCCGACGGCTCCTCK3ACCGGCGTCX3CC 
APGPAAGAVLFADGSWTGVA 


55560 


55561 


TACCGCACCCACCCGCTCMCCAC^^ 

YRTHPLATALLASIPEATLD 

35 


55620 



55740 
55800 
55860 
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55621 TCCGGCCGCGCCGAC^ 55680 

55681 ACCGTCGACTCCGCGTTCGGCTCCG^ 

*P V D S A F G 

55741 CCGGACGCCGCGCAACTCCGC^ 

PDAAQLR VG1J 

55801 CTGATCCACGACACCGCCGCA^ 

55861 GCCCTCACCTTCGTGGTGCCGCGCCCGGCACCGGGGGAGTCAGGCCCGTGTGCGG.CATCG 5S920 

A L T F V V P R P A P G E ^ r p v c g x y (or£i8) 

55921 TGGCGATCCGCTCCGCCGACGGCGGACTCGACGGCGGTGAACTCACCGCGCCGATG^ 
55981 ACCTGCGCCCGCGCGGC^^ 
56041 CCCTCGGCCACACCCGGCTCG^ 

56101 GCCCGGACC^CACCGTCCGGCTCGTCGTCAACGGCGAG^CTACGGCTACCGGGAGATCC 
PDG-TVRLVVNGEFYGYRH. 

56161 GCGCGGAACTGCGCGCCGCCGGCTGCCGGTTCCGCACCGGCAGCGACAGCGAGATCGCCC 
A ELRAAGCRFRTGSDb6.i«" 

56221 TCCACCTGTACCTGCGGGACGGCCGGCGGGCACTGGAGCGGCTGCGCGGCGAGTTCGCCT 
HLYLRDGRRALERLRGtr At 

562B1 TCGTCCTCTCGGACGAACGCCGCGCC^ 

56341 AACCCCTCTACTACACCGAGCGCGACGGGCGGCTCTACGTCGCCTCGACGGTCAGGGCCC 
pLYYTERDGRLYVAb l v ^ ~ 

56401 TGCTCTCCTCCGGCGCCCCCGCCCGCTGGGACACCGCCGCC^CGCCGCGCACCTGCAGC 
LSCGAPARWDTAAFAAHLUL, 

56461 TCGGCCTGCCCCCCGACCGCACCCTCTTCGCCGGCATCCGGCAGCTCCCGCCCGGCTGCC 
GLPPDRTLFAGIRQLPPOI.H 

56521 ACCTCATCGCCGACGCCCAC^CACCCGCGTCACCCCCTACTGGGACCTCGACTACCC 

LIADAHGTRVTPYWDLDYPP 

56581 CCGCCGGCGAACTCGCCGCCCGGGGAAGCCTGGACGACCAC 
56641 GGACCGACGAGGCCG^ 

56701 GCGGCGGCCTGGACTCCTCCGCCGTCGCCGCCTCCGCCGCCCGCCACACCCGGC^ 

GGLDSSAVAASAARHTRl^lA 

56761 CCTTCACCGTCCGCTTCGACGACCCCGCCCTCGACGAGAGCGCCGTCGCCCGGCGCACCG 
FTVRFDDPAFDESAVARR1A 

56821 CCGCCCACCTCKSCCATCGACCACCGCGAAGra 

AHLAIDHREVASERAHFAun 

56881 ACCTGCGGGACGTCGTCCGCGCCGGCGAGATGGTGC^ 

LRDVVRAGEMVQENSHGIAR 

56941 GGTACCTG^CAGC™ 

57001 GCGGGGACGAACTGTTCCTCGGCTACCCCC^ 

GDELFLGYPQFRKDLTLbLib 



55980 

56040 

56100 

56160 

56220 

56280 

56340 

56400 

56460 

56520 

56580 

56640 

56700 

56760 

56820 

56880 

56940 

57000 
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57120 



57540 
57600 
57660 



J UU/4U/U4 
57061 CCGCC^CG^ 

57181 a C i^ 

57241 aaLI^^ 

57301 ggcgc^^ 

5736! TcLoCCOCC^ 

HHLFDLVKni*-* 

57481 CCGGCAAGTACCCG CT GC^^ 

GKYPL RAAMR 

S7601 T G Lo44TT A ^^ 
577 21 AACTC^CCAACJCG^ 

57781 ccLagcgggcagaaaggcggcaacgg^ 

psgqkggngg* 

5784 1 gccgagctgaccgcccggatcgccgccctgtcccccgaacgccgggcggcgttcgagaag 

57901 JTOCwcac^^ 

57961 ccggc^^^ 

SB02! LlcLllcLcGGCXGCGCGGCG^ 

5S08X CTGCGCGGCATCGTCCGCCGCCACGAGGTGC^ 

5BX41 GACCTCATcLgGTCGTC»CCCCACGGC^ 

58201 GGA^C^ 
58261 GAGCACGXX^CGCTGCTGCOT 

58321 LLoccI^ 
58381 Lgc^^ 

SSS01 LcLL^-CTQG^ 



ALE-- 
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58561 


^—^—^^^^^^r^r^r^r^-nrrrrcrr qppj^ppppa APPAPX^CCCTGCTGCTCTCG 

DRPRPAARRGEGANHALLLS 


58620 


58621 


_ _ * ~r*r*nnnnr rTrv^rrnarrTnrnrCGCCGCGAGGGCGGGTCGCTGTTCATG 
CC GG AG CTG AC CGGLCGbL l LbLLurtLL njHjLL.vj^v < ov»vjn>j\j\jwu'j * 

P ELTGRLADLRRRE GGSLFM 


58680 


58681 


~„ .^-^^^/^ r-TPPiYv TPPTrrrr ppthh P a ppnnfGG CCGGG ACCGG CTCG CC 

CTCGTGCTCi CCGCGCTCC IVjKj XLb A Lv.lOLuloy^nLLW^'jy^v-uviun'-wwwv* >~ 

LVLSALLVVLRGTGGRDRLA 


58740 


58741 


^ ^^/^^/^rr-^^/^/^/^/^/^r'pPAPPpppppppa APTrnAGrCGCTCATCGG 

GTCGGCACCC/TCGTCGCCGGCCGCACCCvjv-C^^uamui n-n i *>- 

VGTLVAGRTRPELEPLIGYF 


58800 


58801 


GTCAACGTCCTGCTGCj. oCCC 1 1 LoAunLLouv.uuv.^ovj/t^'* a j. a v_vj\_n_<~»* w w 
VNVLLLPFETGGRTSFAELW 


58860 


58861 


„„--„„~^«««« rrrrrr ^^TnGAGGMTACGCC 

CGGCGGGTCCvjCVjVJ^-tAJ^J^- luu luunuuv-u invuwv*v^v.wnwv™iw.v 

RRVRGRLVEAYAHQELPLEK 


58920 


58921 


_ _ « L—„ ^ r'p r r p & p rr p a P PGP P PP CGCCG ACCCGCCGGTCGG CGTGGTC 
ALELLRADGTAPAD PPVGVV 


58980 


58981 


TGCGTCG CCCAGCAGCCCGCCLCCCCLjA I L al.l.<^ i vjv„k_^vjvjMv- 1 v_o/^v-vj\-vj^vj^vj *. 
CVAQQPAPAITLPGLDASVE 


59040 


59041 


rtrt m rty--h ^rtf¥»rtrt«/^^/^ri^^"pTiPT , 'rppapPTPr;TPnTPnAGGTGCGCGAACGGCCGGAA 
GACGTCGACCTGGGCACCGCCCAG 1 1 CoAt-t. 1 i ^-vj i. ^urtuu i uv-vj\.unnvvjw\-v*«w^" 

DVDLGTAQFDLVVEVRERPE 


59100 


59101 


«/ 1 o«wrt^*^hTri««p«i^pparTLrranrnr,fiarrTnTTCGACGCGGCCACGGTCCGGCTC 
GGCGTGCAGATCGCC 1 1 CL.AIj 1 ai~«/\l.*~vjvjvj.hv-v. xwn (»unv.«^wu\.\,nwvw * 

GVQ I AFQ YDRDLF DAATVRL 


59160 


59161 


rtm^.rt/-.rtrt- R rtA» ^r^^p a ppppptpp^pc a pp a PPPPGCCGCCGACCCC AC CCTGCCCTGT 
CTCGCCGACCACGTGCACOCt-vj l ^l. i Lu/i^^ | j'J l - v - 1 J v - v -- ,J ^ v - uriv '^' v ' v 

LADHVHAVLDQAAADPTLPC 


59220 


59221 


rt^rtrt * ^^rt%rt/-./-i*-i«^r'^/^pppppppppr , ppprrpPPr;nPPrGCACGGCCGGCGCCACGACG 
GCCGAGCTGCCCGCCCCGLLCCCLLL.LoUooL.uk_i_ou^uuu 

AELPAPPAPAAPARTAGATT 


59280 


59281 


CTGCACGCCC/TGTTCGAGTCCCGCGCCGCoAA^avj^-V-uuom^vjv-vju x i j. 
LHALFESRAAKSPDAVALVD 


59340 


59341 


y-i/-<^.rt^irn/-ii\ i~>r*T>i\ rrrr'Ti rprTP a pppnPPPP AAPPGGCTCGCCCGCCAC 

GGCGGCCACCGCGTCACCTACCGGACCC 1 UAAL-AL.^cu^u^v»~/-i/Av_v_\jvj\- a \.vjs„^\.wv-wnw 

GGHRVTYRTLNTRANRLARH 


59400 


59401 


____„__-^A^/^^r'pp'rpprT'iiPppnppRPpr:r;nTf;nrGrTGCGCCTGCCCCGCGGCACC 
CTGCGCGCGGTCGGCG I wCbiA^^o/ioOAU^vj'ju ivjv»v-v>*- -i u^u^v x 

LRAVGVRTEDRVALRLPRGT 


59460 


59461 


--^^^—^^ /^^r^&pppTppppppppTPZkAPPPPGGCGCCGCGTACGTACCCCTCGAC 
GACGCGGTGACCGCCACCC 1CIjH_vj^ui_. 1 thftijuL'-vj\j\-uv.v»M\.vj inwui/v-vvv 

DAVTATLAAL KAGAAYVPLD 


59520 


59521 


^.rtrt^^^^^nArtrt^Rrt^Tv » npppTPRpppppnTPPTPflPPGACGCCCGCCCCGC CGTGGTC 
CCCGCCCTCCCCGAGGAACGGC J.(jALCCtjLA*it_\-i. ^(j^uv3«\-«v-v-v-vj»-v-v-^w*-^-vj 

PALPEERLTRVLADARPAVV 


59580 


59581 


— Artrtfriiv rp«T^-«p7i pp a ppppTPPPpppzir: ATPAPPGCCCACGCCGGCCATGAC 
CTCACCCCCGCGTATC1 tjCACoAt-uoO l LLuttunuA j. \_/\^^u\_v_*_~n^>j * ^ 

LTPAYLHDRSAE I TAHAGHD 


59640 


59641 


_,_ _ n ----.rt^m/i^ii pnrpp j, pa nppTPP.PPTkPPTPPTPPACACCTCCGGATCCACC 
CTCAACCTCCCCGTCCACCCCvjACAACt- ilullihh, i i ^.ftwiwvi»-v.wvn*vvn 

LNLPVHPDNLAYLLHTSGST 


59700 


59701 


_ _ rtrt«>-.oI * «r-r"ppTppTrpppn^ppiiPPf;p.r;nPGPGGTCAACCGCGTCGACTGGATG 

GGCACCCCCcLAGGGCG 1 L.V— i („OoV-ACL.\->i\-^-VJVJOO^rVJV-WVJ J. wwv.v.w\.v * - 

GTPKGVLGTHRGAVNRVDWM 


59760 


59761 


. ___ «r>ppT"rppppappppppfipr:TP,nrPGTrGCCCGCACCGCGCCCGGCTTC 
AGCACCGCGTACLCG 1 1 LLuuAL\»ouLunLu i outw i \.w vv.v.w wiw wv-w <- w« « 

STAY PFRTGDVAVARTAPGF 


59820 


59821 


^r^nn^r^rrrrh nrTPTTPPPPPPrPTGGrPGCCGGCGTCCCCCTCGTCCTCCTG 
GTCGACGCGGTC/1GGGAAL. 1 (.1 1 LUut llll i o«^w^»-\jw\.w * wvwv ww 

VDAVWELFGPLAAGVPLVLL 


59880 


59881 


CCG AC CG ACG AGGCGCGCG ACCCGG CCCTGCTG ACGGCGG CGCTGG AACGG CACCGGGTG 
PTDEARDPALLTAALERHRV 


59940 


59941 


AGCCGGATGGTGACGGTCCCGTCGCTGCTGACCATGCTCCTGGACGAGTCCGCCCGCJGCG 
SRMVTVPSLLTMLLDESARA 


60000 


60001 


ACGGACCTC^C^CCCGCCTGGCCrGCCTCCGCACCTGGATCACCAGCGGCGAGCCCCTG 
TDLGTRLACLRTWITSGEPL 

<2J& 


60060 



WO 00/40704 PCT/USOO/00445 



6006 



1 CCGCCCGCGCTCCCC^ 60120 
PPALARRFHDRLPGRTLbNi, 



60180 
60240 



60121 TACCGCTCCTCa*GACCGCCGC^ 

YGSSETAADATAAKiur^ 

60181 ACIGBKTCCC^^ 
60241 CGCGGCCCGG^ 60300 



60360 



60301 GCGTGCGTGGCC^CGGCTACCACGCCCGT^ 

6 036i gatcccgacog^ 60420 

60421 GCCG^GCOGGCT^^ 60480 

60481 CGCGCCGAGCCCGGCGAGGTCGAACACGC^ 60540 

RAEPGEVEHALLAHfAv^^ 

60541 GCCGTCACGGCGAACCCCGACGCCACC^ 60600 
60601 CCGTTCGCCGCCGGCTCCCCCCAG^ 60660 

60661 GCCCACCTCGTGCCCACCGCCGTCACCGTCCTGGACGAGCTGCCGGTGACCGCGCACGGC 60720 

AH LVPTAVTVLDELPV 1 a n vj 

60721 AAGACCGACCACGCGCGGCTGCCCGCCCCCGACCC^^ ^180 
KTDHARLPAPDPRAGRPAP l 



60840 



607 81 GCCCCCCGCACCCCCACC^ 
6084! 1gGGGCCGGTCG g GCGCG 

609 01 GCCCGCAGTCGCGGC^AACTCCGCGCCCGCCGC^ 

60961 CTTCGCGGCCCCCACCGTCG^ 

LRGPHRRRSVAARTDAAKtr 

61021 ACCGGCCCCGAGCACACCCCGTTCGTCACCGACCCCGGCGCCCGGCACGAGCCG^CCC 
TfiPEHTPFVTDPGARHb** 



60960 
61020 
61080 



610B1 CTCACCGACGTCCAGCGGGCCTACTACGTGGGACGCGAGGGCGGGTTCGCCCTCGGCGGC 61140 
LTDVQRAYYVGREGGFAbt.^ 



61141 GTCTCCACCCACGCCTACCT^AGATCGAGGCCCCGCGGATCGACGTCGCACGGTTTACC 
VSTHAYLEIEAPRID VAKr 

61201 GGCGOGraOGO^ 

61261 GCMCTCCAGCAC^GTGCTCACCGACGTCCCCCCGTAC^ 



GAL 

GGGCTCCJ 
G L Q Q 

61321 GACCTGGACGAGCCCGCGCGGCAG^ 

DLDEPARQRRRAALREEMSH^ 

61381 CAGGTGGTGCCCGCCGACCTCTGGCCCCTGTTC^ 

QVVPADLWPLFDVRVSLGP l 

61441 G ACG CCCT CGTCCACGTGGGGGTGG ACG CGCTG ATCTGCG ACG CCC AC AGCTTCGG CC TC 
DALVHVGVDALlCDAHb^^^ 

61501 GTCCTGGCCGAACTCGCGGCCCGTTACGCCGACCCCG^ 

U5l 



61200 
61260 
61320 
61380 
61440 
61500 
61560 



WO 00/40704 



PCT/USOO/00445 

61561 CCC.ACTTCCGGGACCACC^ 

IGCGCGAACGCCTGCCCGAGCTGCCGCCCGGCCCCGAACrrcCCC 
AAE RYWRERLPELPPGP ELP 



61621 
61681 
61741 
61801 
61861 
61921 
61981 
62041 
62101 
62161 
62221 
62281 
62341 
62401 
62461 



62521 
62581 
62641 
62701 
62761 



62821 
62881 
62941 



GCGG CGG AGCGGT ACTGG i _ _ . p p G P E L. f 

A AERYWRERLPE LPF ^ r 

GGCGTACTGCTG^ 

LLg!^ 

GGCGACTTCACCTCGCTCAGCCTGCTGpAGGTCGACCACA 
G^CGTGACGGTGACACGGGAACGGGCGCT 

CCCGTCGTCTTCACCTCCGACCTGCCTGTCGGCGAGACCGCGGCCGAGGACGCGG 
PVVFTSDLPVGE1AAC 

GGAGAGGGATGGGCGCTCGGAGAGCCCGTCTACGGCGTC^GCCAGACCCCGCAGGTC 
GEGWALGEPViu 

CTCGACCATCAAGTCGCCGAAGACCGAGGGGAG1TGGTCTTCAACTGGGACGCCCTTC 

LDHQVAEDRGELVFNwu 

GACCTGTTCGCCCCGGGCGCCCTGGACGCCATG^CGCCGCCTACACCGCCTCGCTGACC 

DLFAPGALDAMFAAii 

CGCCTGGCCCGGAGCCCCGAAGCCTGGCGGCGGCCCGGCACGCCGCCGCTC 
C*GGCGGCCGTGCGC^ 

CCCCGCCTCGGCGCCCGCCCCGGCCG^ 
CAGGCCGTCGCCGCGCTGGGCGTCCTGGAGTCGGGGGCGGCGTACCT 

GAACTGCCCGCCGAACGGC^GTCCACCTC^ 

ACCGAACGCGCCCTGCTC^ACACGCTCGCCGTCCCCGTC^ 
TERALLDTLAVPVGVTVU 

GACGACGACGCGGCCCTCGACGCCGACGGCGGCCCGCTG^GAGCGTGCAGAACCTCACC 
DDDAALDADGGP^W^ 

GACCTGGrcTACACCAT^^ 
GACCACCTCGGCGCGGCCAACACCCT^ 

DH LGAANTLECVNRR * . 



61620 
61680' 



61740 
61800 
61860 
61920 
61980 
62040 
62100 
62160 
62220 
62280 
62340 
62400 
62460 
62520 
62580 
62640 
62700 
62760 
62820 
62860 
62940 
63000 



3D 



WO 00/40704 PCT/USOO/00445 

63 00! GGCGACGCGGTCCTCGCCGTCTCCTCCCCGAGCTTCGACCTCGCCGTCTACGACCTGTTC 63060 
GDAVLAVSSPSFDLAVYDLF 

63061 GGCGTGCTGGCCGCCGGCGGCACCGTGGTCGTCCCCGCCCACGACCGCOK3CG 63120 
GVLAAGGTVVVPAHDRRRDP 

63121 GGACACTGGGCCGAGCTGATCCGGCGCGAGCGGGTCACCCTGTGGAACTCCGTCCCCGCG 63180 
GHWAELIRRERVTLWNSVPA 

63181 CTGGGCACCCTGCTCACCGAGTACGCCGAGGCCCTCGCCCCCGACGCCCTGCGCACCCTG 6324 0 
LGTLLTEYAEALAPDALRTL 

63241 CGGGCGGTCCTCCTCAGCGGCGACTGGATCCCCCtcggacCgcccgaccGGATCCGCGCC 63300 
RAVLLSGDWIPLGLPDRIRA 

63301 CTGTCCGCCCCCGGCGCCACCGTGATGAGCCTCGGCGGCGCGACCGAAGCCTCCATCTGG 63360 
LSAPGATVMSLGGATEAS IW 

63361 TCGGTCTGGTACGAGATCGGGAAGGTGCACGAGGCGTGGAGCAGCATCCCCTACGGCACC 63420 
SVWYEIGKVHEAWSSIPYGT 

63421 CCCATGGCCAACCAGCGGCTGGAGGTCCTCGACGAGCAGCTGCGGCCCCGGCCCGACTGG 63480 
PMANQRLEVLDEQLRPRPDW 

63481 GTGCCCGGCGAGCTGTACATCGGCGGCACCGGCGTCGCCAAGGGCTACTGGCGCGACCCG 63540 
VPGELYIGGTGVAKGYWRDP 

6 3 54 1 GAACAGACCTCCCTGCGCTTCCCCGTCCACCCGGGCAGCGGGCAACGCCTGTACCGCACC 63600 
eqtSLRFPVHPGSGQRLYRT 

63601 GGGGACTTCGCCCGCCACCTCCCCGACGGCACGCTGGAATTCCTGGGCCGGCAGGACGAC 63660 
GDFARHLPD'GTLEFliGRQDD 

63 661 CAGGTGAAGATCGGCGGATTCCGGGTCGAACTGGGCGAGGTCGAGGCX3GCCCTCGGCCGA 
QVKIGGFRVELGEVEAALGR 

63 721 CTGCCCGACGTCGCCGCCGGCGCGGTGATCGCCACCGGTGACCCGCGGGGCGACCGCCGC 
LPDVAAGAVIATGDPRGDRR 

63781 CTCGTCGGCTTCGCCGTACCGGCCCGGGAGGGCGGCTTCGACGCGGCCGGGCTCCGACGG 63840 
LVGFAVPAREGGFDAAGLRR 

63841 CAACTCGCCCGGCGGCTGCCCGCCTACATGGTCCCCACGACCCTGCTGCCCCTGGACCGG 63900 
QLARRLPAYMVPTTLLPLDR 

63901 CTGCCGCTGACCGCCAACGGCAAGGTCGACCGGGCCGCACTCCAACGCCTCGTCCCCGGC 63960 
LPLTANGKVDRAALQRLVPG 

63961 CGCGCACCGGCCCCGGCGGAACCCGCCACCGCCCCACCTGCCCGTTCCCGCGCCGTCCCC 64020 
RAPAPAEPATAPPARSRAVP 

64021 GTGCCCGGCTGGCTCGCCGACCTGTGGTGCGAACTCCTCGACGTGCCGGAGGCCGACCCC 64080 
VPGWLADLWCELLDVPEADP 

64081 GACGCGAACTTCTTCGCCCTCGGCGGCACCTCCCGGGTCGCGATCACCCTGGTCACCCGG 64140 
DANFFALGGTSRVAITLVTR 

64141 ATCGAGGCCCGACTCGCCGTCCGGGTGCCCCTCGCCCGCCTCTTCGACGCCCGCACCCTG 64200 
IEARLAVRVPLARLFDARTL 

64201 GGCGGCCTCGCCGAGACGATCGCCGAACTGTCGGCCGCCGCCGAGGAGGAGCCGGCACCC 64260 
GGLAETIAELSAAAEEEPAP 

64261 GCCGAGCCCGTCTACGCCCCOTACCCCGCCACCCGCCACGAGCCGTTCCCGCTCACCGAC 64320 
AEPVYAPDPATRHEPFPLTD 

64321 ATCCAGCGCGCCTACTGGCTCXSGCCGGC^CCGCTCCCTCTCCCTTGGCGGCGTCGCCACG 64380 
IQRAYWLGRHRS LSLGGVAT 

64381 CACACCTACCTCGAACTCGACGTCGAGGACCTCGACCCCGGCCGGCTCCAGACGGCCCTC 64440 
HTYLELDVEDLDPGRLQTAL 

64441 CGCCGGCTGATCCACCGCCACGACGCCCTCCGGCTCGTGGTCCTCCCCGACGGCCGGCAA 64500 
RRLIDRHDALRLVVLPDGRQ 

31 



63720 
63780 



WO 00/40704 PCT/US00/00445 

64501 CAGATCCTCGGCGACGTAC(MCCGTACCTCCTCGCC^CACCGACCTX3CG<MGCAGGGCG 64560 



QILGDVPPYLLAHTDLRGRA 
64561 GACGCCGAG^ 

64621 TCCCGCTC^CGCTCTTC^ 

SRWPLFDVRTHRLDDVRTKb 

64681 CACCTGAGCTTGGACCT^CTCATCGCCGACGCC^CAGCGTC^CGTACTCACCGGCGAC 
HX-SLDLLIADAHSVMVl.i^u 

6474 1 CTOCTCACCTTCTA^^ 



L L T F 
64801 CACTAC.T^ 

64861 (»CTGGC«^CCGGCTGG^ 

HWRARLADLPGPPGLPLRCR 

64921 CCCGAGGAGCTGACCGCGCCGCGGTTCGCCCGCCTCACCACCG^ACTCGGCCCCGACGCC 
PEELTAPRFARLTTGLGPDA 

64981 TGGGCACGGCTGCGGCGCGCCGCGGCGGCCGCCGAACTCACCCCGGCCGCACTGATCTGC 
WARLRRAAAAAELTPAALIC 

65041 GCCGCCTTCTCCGACGTCCTCGCCCAGT<^AGCGACACCCCCOT 

65101 ACCACCTTCCACCGCCCCGCCCTGCTCCCCGGCGTGGACGACCTCGTCGGCGACTTCACC 
TTFHRPALLPGVDDLVGDFT 

65161 ACCACGACCCTGCTCGGGGTCGACGGCGAGGGGGACACC^CCGGGACCGGGCCCGCCGA 

GVDGEGDTFRDKAKK 



T T T 

65221 CTCCAGGACCGCATCTGGGAGGACCTCGAACACCGCGTCGTCAGCGGCGTCGAGGTCCTG 
LQDRIWEDLEHRVVSGVEVL 

65281 CGGATGCTGCGCCGCGAGCGGGGCA^^ 

RMLRRERGTHDAVRMPVVt I 

65341 MCACCCTG^ 

STLRAAGPAPRTAPPAWRVK 

65401 CCCGGCTACGCGATCAGCCAGACCCCGCAGGTCCTGCTCGACCATCAGGTGAGCGAGAGC 
PGYAlSQTPQVLLDHQVSEb 



65461 GACGGCCGACTGGTCTGCACCTGGGACTACGTCGCGGACGCCTACCCGCCC^GCTCATC 
DGRLVCTWDYVADAYPPGLI 

65521 GAGGCCATGCTCGGGGCCTTCGAGGCGCTCCTCGCCTCGCTCGCCGGT^CGACGACGAC 
_ — « * p E A L L a q L.AGHUD u 



F G 



E A M 

65581 GCCGGCCACGACGACGACX5CCGGCCACGACGACGGCCCCGG 

AGHDDDAGHDDGPGHDDGPG 



65641 CACGACGACGGCCCCGGCCACGACGACGGCCCCGGCCACGACGACGGCCCCGGCCGC^ 
HDDGPGHDDGPGHDDGPGRU 

65701 GACAGTGCCGATCACGGCCACAGTGCCACGCACGACGACAGCGCCGCCCGAAAC^CAGA 
DSADHGHSATHDDSAARNDR 

65761 G AGGG AGGTGG AC CGGAGTG ACG AGCG C CCGG C C CACG CCG AC ACTGCTCC CCGCCG ACC 
EGGGPE M * TS ARPTPTLLPADQ 

65821 AGCGGGAGCTGCTGCGGATGATGAACGACCGCACCGCACCCGTGCCCGCG^CACCCTCA 
RELLRMMNDRTAPVPAHTL1 

65881 CCG CC C AACTGGC CG ACGCCGCGCG CACGCACG ACCGGG CTCTGG C ACTGGTGG CACCGG 
AOLADAARTHDRALALVAPG 



64620 
64680 
64740 
64800 
64860 
64920 
64980 

65040 

65100 

65160 

65220 

65280 

65340 

65400 

65460 

65520 

65580 

65640 

65700 

65760 

65820 
(orfl6) 
65880 

65940 



WO 00/40704 PCT/US00/00445 

65941 ««««CR«^^ 66000 

66 ooi J™ 66060 

6606X -OTCOTCCO^ 66120 

«m J^ccclc^ 66180 

66181 CC^C^ 66240 

acLcccLLo^^^ 66300 

ACGCCGCCTACCGGCTGGAOTCCCCCGTCAGC^CCGCGCGATCACCACCGCCG 66360 

agI™ 66420 

actcgIL^ 66480 

TCACCCGGGAC^ 66540 

«I4~^ 66600 

666 oi acLggcggcaaactgc^ 66660 

ACCCCGCCCTCG^ 66720 

TCTCCTCGGCCACCCCGTCCGG^^ 66780 

66781 cggaa^ 66840 

66841 ccLga^ 66900 

669 ox ccLLgccccccaccggagaggagc^cgcac^ 66960 

AEPPTGEEnA 

66961 GCGAACCGCTGCTGCGCACCGC^^ 67020 

67021 TCGTCGGCGACGAGACCGCCCGGATCAGCGTCCGCGACCGGCCCCTGAACCTCCAGGACA — 

VGDETARISvki;^^ 

67081 CCG^GACCGCC^^ 
.7X41 GGGGAGACGAGTCGCTCGCGCGGGTACGGCTC 

672 oi aaLctcgcccatctgo^ 
67261 tg^gtccgctgc^ 

67321 AGGCCCCCGCCCCCGCTGCCGTG^ 
67361 ACGAGGCCGAACTCCTCGCCC^G^ 

3© 



66301 
66361 
66421 
66481 
66541 



66661 
66721 



67080 
67140 
67200 
67260 
67320 
67380 
67440 



oo 



WO 00/40704 PCT/USOO/00445 

6744 1 CCGTCGAACCCGATATCAACCTC^ 675 
VEPDMNLLDAGATSVELVRL 

67501 TGGCGACCGCTCTGGAGGAGGAACTCGGCCTCGACACCGACATCGAGGAA^CrTCGCCT 67560 
ATALEEELGLDTDI EELLAr 

G7561 TCCCGTCGGTCGCCGTGATCGTCGGCCGCCACCTCGGCCGCCGGACGGCACCACCGGCCC 67620 
PSVAVIVGRHLGRRTAPPAR 

67621 GGGACCCCCTCCCGCCCGCGTCCGTAGCGTTCXSCACCCC^GTCaSTACTGCCCGCGCCGC 67680 
DPLPPASVAFAPGSVLPAPP 

67681 CCGCGCCCGGACCCGTGCCGCCCGCGTCCGTGCCGCCCGCACCCGCGTCCGTACCGCCCG 67740 
APGPVPPASVPPAPASVPPA 

67741 CGTCCGAGTCCTCACCGCTCGCGCCGCCCGCACCCGGGCCCGTGCCACCCACGCCCGTCC 67800 
SESSPLAPPAPGPVPPTPVP 

67801 CGCCCGCCTCCGTCCCGCCaSCGTCCGGGGCCG^ 67860 
p AS VPPASGAAPHVPPAPPA 

67861 CACCCATCCCCGCGCCCTCCGTGCCccccgcgccccgcccccaaccgcccctgctcaccg 67920 
PIPAPSVPPAPRPQPPLLTG 

67921 gcatcggcgcccgccaggcgTTCAAGGACGCCCACCACGGCATCCGGCACGAGTTCGACG 67980 
I GARQAF KDAHHG IRHEFDA 

67981 CCACCGACGGCGTCGCCCTCAGCGGCCCGGACGACCACCACCTCACCGCCCGTCGCAGCC 68040 
TDGVALSGPDDHHLTARRSH 

68041 ACCACCGCTTCGACCCCXSGCCCCGTGACGCrGCCGGACCTGGCCGCCCTCCTCGGGGCCC 68100 
HRFDPGPVTLPDLAALLGAL 

68101 TCCGCCGGGTCCGCGGCCCGGGAGGCGAACCCAAATACGCCTATCCGTCGGCCGGTTCCT 68160 
RRVRGPGGEPKYAYPSAGSS 

68161 CCTACCCCGTCCAGACCTACCTGCTCGTCCACCCGGGGAAGGTGACCGGACTGCCCGGCG 68220 
YPVQTYLLVHPGKVTGLPGG 

68221 GCAGCCACTACGTCCACCCCGCGCGCAACCGCCTGGTGAGCATCGACCCCACCGCGACCC 68280 
SHYVHPARNRLVSIDPTATL 

68281 TGCCCGCCGACGCGCACGCCGAGATCAACCGCGCCGCCTACGGGGAGGCGGCCTTCTCCC 68340 
PADAHAE INRAAYGEAAFSL 

6 8341 TCTACCTCATCGCCGCGATCGACGCGATCACACCGCTCTACGGCGATCTCTCCTGGGACT 684 
YLIAAIDAITPLYGDLSWDF 

68401 TCACCGTOTCGAGGCCGGTGCCATGACCCAGTTGCTGATCCGGACCGCCGTCGGCACCG 68460 
TVFEAGAMTQLLMRTAVGTG 

684 61 GCATCGGCCTGTGCCCCGTCGGCACGATGGACCCCGCGCCGCTGCGCCGCGCGTTCGCCC 68520 
IGLCPVGTMDPAPLRRAFAL 

68 521 TCACCGACCGGCACCGCTTCGTCCACGCCCTCCTCGGCGGGCGGCCCCGCACGGAGGCCC 6858 
TDRHRFVHALLGGRPRTEAP 

68581 CGTGAACCGGCACGGCCCCCTGGCGGGCCGGCGGCAGAGCGTCGACACCCGCAGCGCCGC ^8640 

MNRHGPLAGRRQSVDTRSAA (orflS) 

68641 GTGGGTGG CGCCG ACGGGCAC CCCGGGGCTG CCG CTGG AGGTGG CCG CCAC CCGGGACGG 68700 
WVAPTGTPGLPLEVAATRDG 

68701 CGTCGACCCGGCCGAATGGGCCCGCACCCACCTCGACACCGTCACCGGCTGGCTGCACCG 68760 
VDPAEWARTHLDTVTGWLHR 

68761 TCACGGAGCCGTCCTGTTCCGCGGCTTCGGCGTCGGCCTCGACGGCTTCGGCGACGTCGT 68820 
HGAVLFRGFGVGLDGFGDVV 

688 21 CCACGCCCTGGCCGGATCCCCCGAGGCGTACGTCGAACGGTCGTCGCCGCGCACCGCCCT 68880 
HALAGSPEAYVERSSPRTAL 



00 



0 



WO 00/40704 



68881 CGGGCATC*CCTCTACACCGCCA^^ 
68941 CGAGAACTCCTACC^^^ 

69001 cLaccggcggcgcgac^^ 



PCT/US00/0O445 
6B940 



R T G G A T 



6906! COCCCTCBIWCC^ 



A L 



A A 



69121 



GATCGGCATGTCCTGGCAGGACGCCTCCCAGACCC^ 



69181 C^CGCCGCCCGCCGCG^^ 

69241 GGTCCGCCCCGCCCTCGCCGTCCACCCGGCGACGG^ 

69301 CGCGTrCTTCCACG^ 



H 



69481 



693 61 CGACGAACGCGACCTGCCGAGCCACTC^ 

69421 CGTeA^^ 

CGGCGACGTCCTCCTCGTCGACAACCTCCTCAC^^ 
GDVLLVDNLLTAHGREPr 1 

CGAACGCCGCGTCGTCGTCGGCATGGCACAGCCGCTGGACTCGGACGAGGTGAGC^CGTG 



69541 



69601 



GMAQPLDWD 
" re 

ACCGCCCCC^^ 

TAPGTPLPATFVQRGLWPb i 



69661 



69721 



69781 



69 



CGCCACGCCCGCCCGGCGGAGGTCACCCACGTCCGCGCCCTGCGCCTGACCGGGGACACC 
RHARPAEVTHVRALRLTGDT 

GACACGGCGCGGCTCACCGAGGCCGTCCGGCGGGTCACCGCCGCCCTCCCCGCCCTCAC^ 
DTARLTEAVRRVTAALPAL 

GCCGAACTCTCCGGCGACGAGGAACCCCGCCTGACCCTCCGGCCGGACGCCCCCGAGGTC 
AELSGDEEPRLTLRPDAPEV 

841 ACCCCGGTCGACC^ 

69901 CTCCGCGCCGACCGGGACCACCCTCGCGCCGGACGCCACCGGGCCCGCTTCCACCTGGTG 
LRADRDHPRAGRHRAKl-n^ 

69961 CGGCTCCACGACGACGAGACGGTGCTCGCGCTCACGGCCC^ 

RLHDDETVLAliT.A" j. 

70021 CCGTCTCTCTACGCCGTGCTCGGCGCGGTCTGCCM 
,003! LlcTACCGCGACGCCACCACCCTG^ 
70141 GCCCGGGCCTCCC^CCGGCGCTGGTGG^ 



A R 



702O1 



70261 



70321 



CCGGCCCCCGC^CCCGCCCCGCGACCGGGTCACCGAAACC^ 

GCAGCGCGCTGGAAAGCCCTGACCGCCCTGACCGCCCTG^COTCCCCCTCGGCGGCAAC 



AARWKALTA 



GGCTCGCTCGCCGTCATGGCCCTGGCCGCCTGGTGCCTGCGCGCCCCGGACCACCGGGGA 

2o 



69000 
69060 
69120 
69180 
69240 
69300 

69360 

69420 

69480 

69540 

69600 

(or£14) 

69660 

69720 

69780 

69840 

69900 

69960 

70020 

70080 

70140 

70200 

70260 

70320 

70380 



WO 00/40704 PCT/US00/00445 
GSLAVMALAAWCLRAPDHRG 

70381 CCGGCCCGCTTCACC*COT^ 70440 
pARF TTVVDLRDHL.GLGPAV 

70441 GGCCCGTTC ACCG ACCG CCTCGTCTTCGGCGCCG ACCTCGG CG AAG CGCCG CG CCCCTCC 70500 
GPFTDRLVFGADLGEAPRPS 

70501 TTCCGGGArcTCACX3CTGCGCX3CCCAGTCrcGGTTCCTGGACGCCGTCGTGCACTACCTC 70560 
FRDVTLRAQSGFLDAVVHYL 

70561 CCCTACGGCGACGTCGTGGAACTCgGCAGGGAACTGGGCCGCGTCACCGCGCCCCGCACC 70620 
PYGDVVELGRELGRVTAPRT 

70621 GCCGCGCACTGGGACGTGGCGCTGAACTTCTGCCGCAACCCGCCCACCAGCGCCGCCACC 70680 
AAHWDVALNFCRNP PTSAAT 

70681 CGCGGCGAACGCACCCTCGCCGAACGCGGCCTGTCCATCGAGCTGTTCCGCGAGGCCGAC 70740 
RGERTLAERGLSIELFREAD 

70741 CTG CTCGGCG CGG C CGGCACCGGTCCCG CG CACCGGTGGGACGGCACGGTG CTCGCCCTC 70800 
LLGAAGTGPAHRWDGTVLAL 

7 0801 TCCCT AGG CG AACTCGGCG ACG ACAC CGTG CTGGTCCTCG ACG CCGACCG CG ACC ACCCG 70860 
SLGELGDDTVLVLDADRDHP 

7 0861 CACCACGGAACCGCCGACCGGCTGCTCCACCGGATGGACGAAGCGCTCCTGGCGGCCGTC 70920 
HHGTADRLLHRMDEALLAAV 

7 0921 GCCGACCCGGACGCCCCCCTGCCCCCCTTGCCCGCCCCCGCGCACACCACGAGGAGCCAC 70980 
ADPDAPLPPLPAPAHTTRSH 

70981 CGATGACGACGACCCCGCGGACCGCCGCCGAGCCCACCTACCACGTGGTGGTCAACGACG 71040 

MTTTPRTAAEPTYHVVVNDE (or£13) 

R * 

71041 AGG AG C AG T ACTCG ATCTGG CTCG CCG AACAGG AG ATC CCGGCCGG CTGG CGGG C C ACCG 
EQYSIWLAEQEIPAGWRATG 

71101 GAACCTCCGGCACCCAGGAGGAGTGCCTGCGCCACATCGACGAGGTGTGGACCGACATGC 
TSGTQEECLRHIDEVWTDMR 

71161 GCCCCCGCAGCCTGCGCGAGGCCATGGCCGCGGCGGAGCACGCGGAGCCCGCTCCCGCCC 71220 
PRSLREAMAAAEHAEPAPAP 

71221 CGGCCCCGGCCGAGGAGGAGCCGAGCCTCGTCGACCGGCTCTGCGCGGGCGACCAGCCGG 7128 0 
APAEEEPSLVDRLCAGDQPV 

712 81 TGGAGTCGGTCCTCCGCCCGGAGCGCACGGCCGCCGCCCTGCGGGAGGCCGTCGACCGCG 7134 0 

ESVLRPERTAAALREAVDRG 

713 41 GCTACGTCTTCGTCCGCTTCGCCGCCACCCGCGGCGGCACCGAACTCGGCGTCGCCGTCG 714 00 

YVFVRFAATRGGTELGVAVD 

71401 ACCCCGCGGCGACC^CC^TGGACGGCACCGAGCrrGCGCCTGACCGGCACCCTCACCCTCG 71460 
PAATTMDGTELRLTGTLTLD 

714 61 ACTTCGAACCGGTCCGCTGCCACGCCCGCGTCGACGTGACCACCTTCACGGGCGAGGGCC 71520 

FEPVRCHARVDVTTFTGEGR 

71521 GCCTGGAGCG CGTG TCCGGCACCTGACCCCCGCCGGCCACCCGGCCGTG AGG CGCGGCTC 71580 
LERVSGT* 

71581 GGGACCGGGCCGCCGACCCACCGAAGGGAGGGACCCCATGACCACCCCCATGACCACCCC 71640 

MTTPMTTP (orfl2) 

71641 CACGACCACCCGCACCACCACCCGCACCGCCGTCTTCGCCCACCTCCGCGCCCCCGGCCT 71700 
TTTRTTTRTAVFAHLRAPGL 

71701 CGGCGACCTCCTCCAGTCC^ACATCGGCCTCGCCCTCGTCCGCCGCGCCCGCCCGGCGAC 71760 
GDLLQRNIGLALVRRARPAT 

71761 GGCGGTCACCCTGGTCGTCGGCGAGGACCTX3GCGGCCCGCTTCGGTCCGGCACTCACCCG 7182 0 
AVTLVVGEDLAARFGPALTR 



71100 
71160 



2te 



WO 00/40704 PCT/US00/00445 

:CACACGTACGCCACCGACGTGCTGCCCTGCCCCCA 
HTYATDVLPCPQ 



71821 CCACACGTACGCCACCGAC^ 71880 
u-rvaTnVLPCPQRGEADPRW 



71881 GCCCGCCTTCCTGCGCACCCTGGCCGACCGCCGCTTCGCCC^^ ^1940 
PAFLRTLADRRFALAVVDPD 

72000 

iGGGCQ^tAUjtL^VjV-^rtv.u^^v.\j«vjN-w>-'->"- 

S Q 

72001 GCCGCAGGACCC*CCC<^GACGAACACAT^ 72060 



72180 



72240 



71941 CAGCCAGGGCCTGCACGCCGGCCACGCCCGGGCCGCCGGCGTGCCCGAGCGGATCGGCCT 
SQGLHAGHARAAGVPERIGL 

3CCGCAGGACCGGCCCGGAGACGAACACATCACCCATCCCATCCGCCTCCCACGTCCCC 
PQDRPGDEHITHPIRLPRPL 

72061 GTCGGGGACCCCGGACCTGTACGAGTACGCCA^^ ^2120 
WGTPDLYEYATALAAALGLP 

72121 CGCACCGCCGCGCCC(^CGACGTCCTGCCGGAGCTCCCCCGCACCCGCGGCGTC^CC^ 
APPRPGDVLPELPRTRGVRP 

72181 GCCGACGGCCGGTCTGCCC^GTCCGCTCGTCGCCGTCCACCCCGGCGGGGCACCG^CTC 
PTAGLPRPLVAVHPGGAPHW 

72241 GAACAGGAGATGGCCGCTCGAGCACTACGCCCGGCTCTGCGC^ 72300 
NRRWPLEHYARLCARLAAEL 

72301 CTCGGCCTCCCTCTGCCTGCTGGGCGACGAAGCCGAACGCCCCGAGCTGGAACTXjCTCCG 72360 
SASLCLLGDEAERPELELLR 

72361 GCACGCCGTCCTGACGCGGTCCCCGCGAGCCGTCGTCCACCTCGAGGCGGGCGCGGACCT 72420 
HAVLTRS PRAVVH LEAGAD L 

72421 CGACCGGACCGCGAACGTCCTCGCCGACGCCGACCTGCTrcTCGGCAACGACTCCTCGCT 72480 
DRTANVLADADLLVGNDSSL 

72481 CGCCCACGTCGCCGCCGCCGTCCGCACCCCGTCCGTCGTCCTCTACGGCCCGACCG^ 72540 
AHVAAAVRTPSVVLYGPTGT 

72541 CGAGTACCTGTGGACCAGGATCTACCCGTACCACCGCGGGGTCTCCCTGCGGTGGCCGTG 72600 
EYLWTRIYPYHRGVSLRWPC 

72601 CCAGCGGCTGCGGCACGCCGCAGGCGAACTCGCCGGCCGGCGGTGCGCGCAC^CTGCGT 72660 
QRLRHAAGELAGRRCAHGCV 

72661 cctgccctaccagggcccggccggcccgtAtccgcgctgtctggccgacctgccggtgga 

LPYQGPAGPYPRCLADLPVD 

72721 CAGGGTCTGGCCGGCGGTGACCGCCCGATGGGCGAGCCCCCACCCCGTGACGATCAGGAG 
RVWPAVTARWASPHPVTIRS 

72781 TACCCCATGAGCGCCGACCCGTCCCGGGTGCGGACGATCCTCTCCGTCAACTTCAACCAC 

TP M*SAD PSRVRTILSVNFNH 

72841 GACGGCTCCGGCGTGCTGTTGCGGGAGGGCAGGATCGCCGGCTACGTCACCACCGAGCGC 
DGSGVLLREGRIAGYVTTER 

72901 CGCTCCCGCCTCAAGAAGCACCCGGGCCTGCGCGAGGAGGACCTCGACGAACTGCTGGAC 
RSRLKKHPGLREEDLDELLD 

72961 CAGGCCGGGGCCGACCTCTCCGACATCGACCACGTCATGCTCTGCAACCTGCACACCATG 
QAGADLSDIDHVMLCNLHTM 

73021 GACACACCCGACATACCCCGGCTGCACGGCTCCGACCTCAAGGAGACCTGGCTCGCGTTC 
DTPDI PRLHGSDLKETWLAF 

73 081 TGGGTCAAC^GCGCAACGACGAGGTGAGCCTGCGCGGCCGCCGCATCCCCTGCACCGTC 
WVNQRNDEVSLRGRRIPCTV 

73141 AACCCGGACCACCACCTCATCCACGCCGCCACCGCCTACTACACCTCCGGCTACGACTCG 
NPDHHLIHAATAYYTSGYDS 

73201 GCGATGGCCGTGGCCATCGACCCCACCGGCTGCCGCGCCTTCGCCGGCAAGGGCAGCCGC 

amavai dptgcrafagkgsr 

'3^ 



72720 

72780 

72840 
(orfll) 

72900 

72960 
73020 
73080 
73140 
73200 
73260 



WO 00/40704 PCT/US00/00445 

7326! CrCrACCCCCTGCGCCGCGACCrCGACGCCTGG^C^CGCC^CATCGGCTACTCCTAC 73320 
LYPLRRDLDAWFNANIGYCI 

73321 GTCGCCGACCrGATGTTCGGCTC^^ "380 
VADLMFGSSIVGAGKVMOi.« 

73381 CCCTACGGCAGACCCGCCGACGGCGCCGGCCCCG^ " 440 

,3441 OJKtTOOCOOCCn^ 73500 



73S01 AGGAAGCTCAACGCCACCCTCGCCCACTACATCCAGCTGGGCCTGGAACGCCAGCT 7356 

RKLNATLAHYIQI'G LERyl ' 
73561 GCCGTCTTCGCCGAGCTCGCCCCGCTCTCCGCCCGCAACG^ 7362 



L 

CTCTCCGGCGGTACCGCCCTCAACGCCATOTCCACCCAACTCGCC^CGAGTCGACCGGC 73680 



73621 £^g^Q q * it ^ 
73681 TTCGAGCGCATGCACCTCCACCCCGCCTGCGGCGACGACGGCACCGCGATCGGCGCGGCG 73740 
FERMHLHPACGDDGTAIGAA 

73741 CTCTGGCACTCGCACCACGTCCT^^^ 73800 
LWHWHHVLGHPRLHHTNADi, 

ATGTACTCCGTCCGTGAGTACCCCGAGCACACCGTCCGGCGGGCCGTGCGGGACCACGCG 73860 
MYSVREYPEHTVR RAVRDHA 

GCCGACCTCGTCGTCGAGGAGACC(K3C(»CTACGTCGCCA(3<3GCCGCCG^CTGGTCGCC 73920 
ADLVVEETGDYVARAAELVA 



73801 



73861 

A D L 



73921 GGCGGCGCCGTCATCGGCTGGTACGACGGCGCCGGCGAGGTCGGGCCGCGGGCCCTGGGC 73980 
GGAVIGWYDGAGEVGPR« lj,J 

73981 CACCGCAGCATCGTCGCCgAcCCGCGCGACCCCGCCATGCGGGACCGGCTCAACTCCCAG 74040 
HRSIVADPRDPAMRDRLN&U 

74041 GTCAAGTTCCGCGAACACT^CCGGCCCTTCGCGCCGTCCGTGCTCAAGGAGCACG^ 74100 
VKFREHFRPFAPSVLKEHAA 

74101 GAGTGGTTCGGCCTCTCCGACAGCCCCTTCATGCTGCGGGCCACCCC^CCTCAAGCCC 74160 
EWFGLSDSPFMLRATPVLKP 

74161 GGCGTGCCCGCCATCACCCACGTCGACGGGACGTCGAGGATCCAGTCGGTCACCCGCCAG 74220 
GVPAITHVDGTSRIQSVTKU 

74221 GACACCCCCGCCTTCCACGACCTCATCCACGCC^CAAGGACCGTACGGGGATCCCC^TG 74280 

DTPAFHDLIHAFKDRi^-'-r 

74281 GTGCTCAACACCAGCCTCAACACCAA<^ 74340 
VLNTSLNTKGEPIAETPEDA 

74341 CTGCGCACCCTGCTCGGCTCCCGGCTCGACCACCTGGrGCTCCCGG^ 74400 
RTLLGSRLDHLVLPGLivt> 



74521 



GGCCGGACGGCGGCCCGCTCATGAGCGCCCCGOTGGGCGAGCGGACCCGGTOCCG 74460^ 
GRTAARS * 

TCGAACGCGACATCGCCGCGATCTGGGCCGAGACCCTCGGCAGGGACAGCGTCGGCCCGC 7452 
ERDIAAIWAETLGRDSVGPH 

ACGAGGACTTCGCCGCGCTGGGCGGCAACTCCATCCACGCCATC^GAT^CC^CCGGG 7458 
EDFAALGGNSIHAIKITNRV 

TGGAGGAACTCGTCGACGCCGAGCTGTCC^TCCGCGTCCTGCTCGA 74640 
EELVDAELSIRVLLETRTVA 



0 



0 



74581 

74641 CCGGCATGACGGACCACGTCCACGCCACG^ 74700 
G M T D H V H A T L T G E R D R m n t d ^ 

" 38 ' 



WO 00/40704 



74701 CCTGCCCCGGCTGCTCGACCGGATC^ 
74761 CCTCGACACCTAC^TCTGGGGAG^ 
748 21 CGTCACCCTCACC^ 
748 81 CCGGGCGCTC^CGCCKAA^ 

,4941 CCGGCTGCGCGAAGCCCTCCGTGCGCGGGACGTCGACACCGGCGGACTC^CGTACAGCC 
rLREALHARDVDTGGIj' vu 

75 oox cg^cccgac^cgg^ 

75 061 CC^CGAGGGCGGCGAACAC^ 

75 1 2 1 ACGGGCCGCCGGCCTGCTGCCCGCCX3TCGACGCCGTGATTOTCTCCGACTACGGGTACGG 
RAAGLLPAVDAVIVSDXl>i^ 



PCT/USOO/00445 

7 4760 



75181 CGTGTGGGAGCCCGACACCGTCGCCCGGCTCGCCGCACACCGCGAACTCGGCCCGTCCAC 
VWEP DTVARLAAHRELGPSi 

•75241 CCTGGTCGTCGACTCCCGCCGGCCOGCGCGCTTCACCGCGCTG 



75301 



75361 



LVVDSRRPARFTA LRASAVK 

ACCC AACC ACGCGG AGG CG CTGCG C CTGCTCG ACG CCGGCG AACCCCCG CCCGGC CCGGC 
PNHAEALRLLDAGEPPPGPA^ 

CAGGGCGGACTGGGCGGCCGCCCTCGGCGACCGGCTCCTGCGC^ACGGGAGCCGAACG 



74820 



74880 



74940 



75000 



75060 



75120 



75180 



75240 



75300 



75360 



ADWAAALGUKLiu^LTGA 

75 421 GGTCGCCCTCACCCTGGACGCCGACGGATCACTGCTC^CGAACGCGACCGGCCCCCGGT 
VALTLDADGSLLFERDRffv 

75481 CCGCACG^CGCCGGGGGCAGCCGGGCA^ 

75541 CTTC^CCGCGGCCCTCACCCTCGCCC^ 

FTAALTLAL.AAGADSAVAHr- 

•75601 ACTGGCCTCCGCCGCCGCCGGCACGGCCGTCGCCACC^ 

75661 CGACGAACTGCGCCGACTGCTCSGCGGCACCGGCAW 

DELRRLLGGTGKVCRTG 1 u * ^ 

75721 CGCCCGGCTGCTCGACCCGGCCX3CCCGCGACCGCCGGGTCGTCTTCACCAACGGCTGCTT 
ARLLDPAARDRRVVFTNfatr 

75781 CGACCTCCTGCACGGCGGCCACGTCTCCTGCCTGAGCCGGGCCAAGGAACTGGGCGACCT 
DLLHGGHVSCLSRAKELGU^ 

75841 GCTCGTCGTCGGCGTCAACTCCGACGCGAGCGTCCGACGCCTCAAGGGCCCCCGTCGCCC 
LVVGVNSDASVRRLKGPRRP 

75901 GGTGATCCCCCTCGCCGAACGCATGCGCGTCCTCGCCGCCCTGAGCTGCGTGGACCTCGT 
VIPLAERMRVLAALSCVUL.V 

75961 CGTGCCCTTCGACGACGACAGCCCCGCCX3CCCTCATCGAGGCCCTCCGCCCCGAGGTCTA 
VPFDDDSPAALIEALRPEVX 

,6021 CGCCAAGGGCGGGGACTACACCCTCGCGACCCT^ 

76081 CGGCGGOSTCGTCCACCTGCTCCCCAGC^ 

GGVVHLLPSVADTSTTUl 

76141 GCGCATCCACGCCCTGTCCAGGACCGGCGAGGGAGACACCCCATGAGCCACGCCATCGGA 

M S H A 1 W> 



75420 



75480 



75540 



75600 



75660 



75720 



75780 



75840 



75900 



75960 



76020 



76080 



76140 



76200 
<or£8) 



WO 00/40704 PCT/US00/00445 
RIHALSRTGEGDTP* 

76201 CCGAGCCGGCTC^TCCCrcCCATCCG 76260 
PSRLIPAIREALGDEKDPRL 

76261 GCCCTCTACGTCC*^^ 76320 
ALYVHVPFCSSKCHFCDWVT 

76321 GACATCCCCGTCGCACGCCTGCGCGGCGACAGCCGGGAACGCTCGCCCTACGTCACCGCC 76380 
DI pVARLRGDSRERS PYVTA 

76381 CTCTGCGACCAGATCCGCTTCTACGGCCCCCAGCTC^CCCGGCTCGGCrACCGCCCC^ 76440 
LCDQIRFYGPQLTRLGYRPE 

76441 GTCATGTACTOGGGCGGCGGCACCCCCACCCGGCTCACCGGCGACGAGATGA 76500 
VMYWGGGTPTRLTGDEMTAV 

76501 CACCAGGCCCTCGACGACGCCTTCGACCTGACGGGACTCCGCCAGTGGTCGGTGGAGAGC 76560 
HQALDDAFDLTGLRQWSVES 

76561 ACCCCGAACGACCTCGACCCCGCCACCCTCGACACCCTGCGCGGCCTCGGCGTCACCCGC 7 6620 
TPNDLDPATLDTLRGLGVTR 

76621 GTCAGCGTCGGCGTCCAGTCGCTCAACCCGTACCAGCTGCGCAAGGCAGGCCGGGCCCAC 76680 
VSVGVQSLNPYQLRKAGRAH 

76681 TCGCGCGAACAGGCCCTGGCCGCCGTCCCCCTGTTGCGC^CGCCGGCATCGACGAGTTC 76740 
SREQA-LAAVPLLRRAGIDEF 

7674 1 AACGTCGACCTGATCGCCGGCTTCCCCGGCGAAGCCGTCGAGTCCTTCGAGGAGACCCTG 76800 
NVDLIAGFPGEAVESFEETL 

76801 CGCACCGTCCTCGCGCTCGACCCGCCGCACGTCTCCGTCTACCCCTACCGCGCCACCCCC 76860 
RTVLALDPPHVSVYPYRATP 

76861 AAGACGGTCATGGCCATGCAGCTCGACCGCGAGTTCGTCGAGGCCCGGAACCGGGACGGC 76920 
KTVMAMQLDREFVEARNRDG 

76921 ATGATCGACGCCTATGAACOTGCCATGGCCGCGCTCGGCGCCGCCGGCTATCACGAGTAC 76980 
MIDAYERAMAALGAAGYHEY 

76981 TGCCACGGCTACTGGGTGCGCGACGCGCGCCACGAGGACCAGGACGGCAACTACAAGTAC 77040 
CHGYWVRDARHEDQDGNYKY 

77 041 GACCTGGCCGGCGACAAGATCGGCTTTGGCAGCGGCGCCGAAT^ 77 10 0 

DLAGDKIGFGSGAESI IGHH 

77101 CTGCTCTGGAACGAGAACAGCGCCTACGCCCGCTACCTGCTCGCCCCCCGCGAGTTCTCC 77160 
LLWNENSAYARYLLAPREFS 

77161 GCCGCCCAC^GTTCACCACCGCCGAACCCGACCGCCTGACCGCCCCCGTCGGCGGCGCG 77220 
AAHRFTTAEPDRLTAPVGGA 

77221 CTGATGACCCGTGAAGGCGTGGTCTTCGCCCGCTTCCGCAGACTGACCGGCCTGGACTTC 77280 
LMTREGVVFARFRRLTGLDF 

77281 GCGGACGTCCGCGCCACACCGTACTTCCGCCAGTGGTTCGAGCTCCTGGAGCGCTGCGGC 77340 
ADVRATPYFRQWFELLERCG 

77341 GGCCGCTTCGTCGAGACGCCGTACAGCCTCCGCCTGGAGCCGTCCACCATCCACCGCGCC 774 00 
GRFVETPYSLRLEPSTIHRA 

77401 TACATCACCCACCTCGCCTACACCATGGCCCATGGCCTGGCCCCCGAACGCGCCTGA 77457 
Y I T H LAYTMAHGLAP ERA* 

SEQ ID NO: 2 ORFS BLM gene cluster ORFs 31-40 

(notice this part is on the reverse strand and the last nucleotide (186 
the first (1) on the whole cluster of 77457 bp. Also the last orf (40) 
incomplete and contains frame shifts) 



46 



MTENLPSCPECSSAYi 
(orf31) 

m gacaaccccgaagaccgccc^^^ 180 

30, ATCGA«*GTTCGGCGC^ 360 

36 1 aCGCCGGCCCAGGCCCTGCCCAGGCTCCACTACGCCGCGGCGCAACCGAGCCGGAACGGG 420 

421 GCCCGGGCCCGCTCCAAGTCCCGrT^ 48 ° 

481 C^GGGTCGCCGTCCCCGTTCGCACGCGTCGTACACGCCACCACGCACGGCACGGAACTC 540 

541 CCCGAACTCGCCACGrrCCCCAAGTCCCCGCGTGCCCGGATCCGCCCGGACCX^CGTCGG 600 
601 TCCGCCCGCCGGGCCGCGGCCGGGTCCCCGGGCCGCGGCGGGAGGGGGTCTCGCGCCGTG 
661 gAACGCCGGCCGGAAATTTACGTATAGGTAGAGATCCCGGCGAAGCGATCGGCGCGTTAT 

72 1 GGCAGCATCCGCGCCGGCCCGCCGCGCAGTTCCTCGGTCCCGGACCGATGGCGTCAAAAG 780 

781 TGAGCGACGAAATCGCCGGATCGCGCGAGGACCGTCGCGGGCCGCACGAGGACAACCGGG 840 

841 GGATATATCAGCGCATTCCCAGGTCACGCGTTGACTGGAAATCGCCTACTTATCGCGTCA 900 

901 cgcctgxagggatcatggccgggaatggcctcaga™ 960 

(orf 32) 

96 1 tccgactgtcggcagcgctcggggatcacggtgacgaatg 



660 



720 



S D C R Q R 



1021 CAA^CG^ 1080 

1031 ATCC^CACAAA^^ " 4 ° 

H41 GCCACGCGCCTGGCC*A^^ »" 

1201 CCCG D A T CCCGA T CG v ^ 

1261 CCCCTGGATCCCG^CATACCGGCC^ »" 

1321 G d ACGTTCTC^ 1380 

-41 



PCT/US00/00445 

WO 00/40704 

»« ^-^^ 1440 

lMl ccc^-cc^^ 1500 

ASElAYVLYPi 

150l CTC^CCT^^ 1560 
V VSYRD MAKI 

lMl ^ccc*^^ 1620 

1621 ac«c™J^^ 1680 

ADETRP RFV 

1691 c^^^ 1740 

l74X fTT c^ 1800 

1801 ^cc*^^ 1860 

1861 oaacccoo^^ 1920 

1921 r c^^ 1980 

1961 2040 

aM1 CCCCCC*^^^^ 2100 

2101 r c TO co T occo 7 ^ 2160 

2161 oacccoccccctcc^ 2220 

DG PGARR VAA 

2221 cccccccac^^ 2290 

22B1 cooccooca^^ 2340 

aM1 cxcc^c^^^ 2400 

2401 2460 

M6l ccoccc™^^ 2520 

2521 —--c^n^ 2580 

258 X CTCCCCA^CCAC^^ 2640 



WO 00/40704 PCT/USOO/00445 

LPMDHPR PAVQSERGETVGF 

2641 GCGCTGCCCGACGCGCTGGTCGCCGCXJCT 2700 
ALPDALVAALEKLGREQ^Al 

2701 CTGTTCATGACG CTG CTCGGCGCCTTCCAGGTCCTGCTGGCG CGTCACG CCGGGCAAG AG 2760 
LFMTLLGAFQVLLARHAGQE 

2761 GACATCGTGGTCGGCGTGCCGGC^C^ 2820 
DlVVGVPAAGRTRTETEPbV 

2821 GGmCTTCGTCAACACGCTTCCCTTGCGGGCGATCTGCGCTCCGGG 2880 
GFFVNTLPLRAICAPGLSFR 

2881 GACCTGCTGGACCAGGTGCGCGAGGCO^ 2940 
DLLDQVREAALGAFAHQDLP 

2941 TTCG AGGCG CTGGTCG AGG CG CTCG CACCCGAGCGCG ACCTCGG CCA C AATCCCCTCGTC 3000 
FEALVEALAPERDLGHNPLV 



3060 



3001 CAGGTCACCHTCCAGCTCCTCGGCACACCGGCGG 

QVTFQLLGTPAARPDLIGTE 

3061 GTCGAGCGGTACCCGGTCCAGGAGGCCGTCTCGCAGTT03ACCTGTCCCT 3120 
VERYPVQEAVSQFDLSLDIK 

3121 CGGGCCGACGACGGTTCCTACCGGGGGATCCTGAACTACTCCCCCGACCTGTTCGACCGA 3180 
RADDGSYRGILNYCPDLFDR 

3181 CGCCG^TGGAGGTGCTGGTCGGCCACTACCTGACGCTGCTCGGCGCCGCCGCCGCGGAC 3240 
RRMEVLVGHYLTLLGAAAAD 

3241 CCGGGCCGCCCGATCGGTGAGCTGCCGCTGTCCGACGGGGCCGAACGGCTGCGGCTGCTC 3300 
PGRPIGELPLSDGAERLRLL 

3301 GACGGGTTCGGGAAGCGGGACGCGGCGTACGCCGGGCCGGGAAGCGTTCCGGAGCGGTTC 3360 
DGFGKRDAAYAGPGSVPERF 

3361 GCGGAGGTGGCGCGGACGGCACCGGACGCGCGGGCGGTGACGTGTGGCGCGACAACGCTC 3420 
AEVARTAPDARAVTCGATTL 

3421 ACCrTCGCCGAGCTGAACGACCGGGTGGAGCGCCTGGCACAGGCACTGCTCGGCGCCGGG 3480 
TFAELNDRVERLAQALLGAG 

3481 GTCACCCGCGAGACGCCGGTCGCGGTCCGCCTGCCCCGTTCCACCGACAGCGTCGTCGCC 3540 
VTRETPVAVRLPRSTDSVVA 

3541 CTGCTGGCCGTCATGCGGGCGGGCGGCGTCTACGTCCCCCTGGACCCCGACTGGCCCGCG 3600 
LLAVMRAGGVYVPLDPDWPA 

3601 QACCGCACCGCCTACATC^ 3660 
DRTAYILDDTAASVVITRDL 

3661 CCCGCACTCCCOSGTCGCCTC^^ 3720 
PALPGRLHVDPRRPAADGLV 

3721 CCCGCGCCCCGCATCGACCCCGATCAGGCCGCGTACGTCATCTACACGTCCGGCTCGACG 3780 
PAPRIDPDQAAYVIYTSGST 

3781 GGCGCGCCX5AAGGGCGTCGTCGTCCGGCACCTGCTCCCTGAACCACCTCACC^ 3 84 0 

GAPKGVVVRHRSLNHLTSAL 

a43 



WO 00/40704 PCIYUSOO/00445 



364l ««C«CCrrTCr^^ 

euMLAGHELFlvrc 



4021 CCCTCGCC^^ 

4 o 81 tcgcagctc.aactc^ 



3900 



3960 



4020 



4080 



4140 



4200 



4141 c^tcg^ccc^ 

4201 CCCACTCGC^^ 4260 



4261 C^TCCGACCC^^ 020 

4321 CGCGTGCTciACGACCGA^GCGACCCGTACCL G E I Y L 



CGTGGGCGTCGCCGGCGAGATCTACCTC 4 380 
RVLDDRLRPVPVGVAG 



4381 «CGQMCCQ«CI^^ 



»_WW»ww 4440 

gT77T"a"r" g ylnrpa 



4441 gtcgccgacccctaccc^ 4500 



V A 



4S01 CGCTGGCGCCCCGACGGCACCCTCGAATACCTGG^ACGC^CCG^CGACCAAATCAAGATC 4S60 
RWRPDGTLEYLGRTDUU 



4561 CGCGGCTTCCGCGTCGAACCCGGCGAAATCG^GGCCGTCCTCACCC^CCACCCCGCCGTC 

RGFRVEPGEItAVi* 



4620 



CGCGCGGCTGGTCGCCTACGTCACGCTCGCG 4680 
K™e" A~~A~ V V D D A H A 



4621 AAGGAAGCCGCCGTCGTCGACGACGCGCACGCCwww*.---- --- y «p L A 



4681 CAAG G GCGGCGG^ 4740 

4741 CACATGGTGCCGTCGGCGG'IWTCGTCCTGGAGGCGCTGCCACTGACGTC 4.00 

HMVPSAVVVLEAuru 

480 X «0««GC^ "« 



L D R 

GTGGCGCCGOTCGACATGGTGGAGGAGGT^ 4920 
4 921 GTCGACCGGGTCG^TGTGCACGACGAC'^ * * £~ ~q q „ s L t V 
4981 GTCCAGGTGATGACCC^TACGAAAOj..^. ----- ---^ p L R E L 

CGTC 

4^ 



4861 

VAPRDMV£.£.vv^w- 

ITTCTTCGAGCTGGGCGGGCACTCGTTGCTGGTG 4980 
VDR V G VKDDFFE 

GCTGCTCGGCX3TCGAGGTGCCGTTGCGGGAGCTG 5040 

v q v""m"t "r irkllgve 

5041 TTCG ACG CCGCGACGGTCG AGGAGCTCX5 CCGCCCG CGTCCGCGCCG CACGGACCG AGGGC 5100 
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FDAATVEELAARVRAARTEG 

5101 ^^fTTTTTT^^ 5160 

" 2i ^rrfTTT^^^ 5280 

5281 CTGCGGATCCTGGTCGAGCGGCACGAGACGC^^ 5340 
53 «1 G^CCCCACCAGG^^ 

540! GTGCGGATCGAGG^^^^ »«" 

546! GCGCGCACCCCGTTCCGGCCCGCGGACGGCGtt^ 5520 

5521 ^^p A ^ A ^*^^^^L^V^V^^^^^^^^^^^^^^^*^^^^^^ 5580 

5581 GTCGACATCCTGGTGGACGAATTGGGGCGCCTCTACWGGAACACGTCACGG^TGACCCC 5640 

5641 GCCGGGCTCCCTCCGCTCGAC^CAG^ 5700 

aglppldvq 

570! ™««™^^ 5760 

576 i cccrcoar^ 5820 

5821 GAGACCGTCGAGTTCCCCCTGCCCGCACCACTGGTCGCGCGGCTGGAAGCGCTCTCCCGG 5880 

etvefplpaplvarleai.uk 

5881 WflCMOGCOrCAC^^ 5940 

5941 TACACCGGTC^ 6 °°° 

6001 ACCGAGCCCCTGGTCGGC^CTTCGTCAACACCCTTCCGGTAC 6060 

6061 GAGCTGTCGTTCCGCGCCCTGCTCGACCGGGTC " 20 

6121 CATCAGGACCTGCCCTTCGAGGCGCTGGTCX^GGCGCTCGCGCCCGAGCGCGACCTGGGC 6180 
HQDLPFEALVEALAPERDwu 

6181 CACCACCCTCTCOTCC^^ 6240 

6241 CTGCACGGCACGGACTGCGTCTCGCTCGGCTTCGGCGGTGTGA^ 630° 

LHGTDCVSLGFGGVTSKtw^ 
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6301 TCCCTCGACGTCGTCTCGGGGCGGCGGGGGAAGCGGTGCGTGCTGACGTA^ 6360 
SLDVVSGRRGKRCVLTYCPU 

6361 CTGTTCGACCGGCCCCGCATGGAGGTGCTGGCCGGCCACTAC^CCCTGCTCGGCGCG 6420 

6*21 GCOKCGACGATCCC^TCTC^^ 6480 
AADDPGLRVGDLPLSDLFvck 

6481 ctgcgcctgctokcgggtcccgcccgcggtac^ 6540 

LRLLGGSRPR^L 

.mi ccmacciw^^ 6600 

PDAFAAQVKA i rw« 
6601 GACTCGACG^CG™^ 

666! CGGCGC^CGGCG^ «» 
6721 GCCGTCGTGGCCCTCCTGGCCGTCCTGCGGGCGGGCGGCGTCTATG 6780 

678! OAOTGCCcicCGQCC^ 684 0 

EWPSGRVAHVLDETAAPVVI 

6841 ACCCGCGACC^CCGCCGATCCCGGCCG^ 6900 
TRDLPADPGRVHLDPRQAPA 

6901 GACGACCGGGATCCCCTGCCGCGCCTCCACCGCGACCAGGCCGCGTACATCATCTTCACC 6960 
DDRDPLPRLHRDQAAYIIFT 

6961 tcgggctccacotgcgcccccaag^ 7020 

SGSTGAPKGVVVRHGSLYHLi 

7021 CTGGGCCACGTACOTCGCATCGCGGA^ 7080 
LGHVRRMAEGGPRRNVAHTT 

7081 GCGATGACCTTCGACCCGTCGCTGGAACAGTTCCTGTGGCTCGTCGCCGGACACACCCTG 7140 

amtfdpsleqplwlvaghtli 

7141 CACGTCGCGCCCGAGGAGGTCCGCCGCGATCCCGAGGCGCTGGTGGCCCTGGTGCGGCGC 72 oo 
HVAPEEVRRDPEALVALVKK 

7201 GCCGCGATCGACGTCCTCAACGTCACCCCGTCCCACCTCACCCTGCTGATCGAGGCCGGG 7260 
AAIDVLNVTPSHLTLLIEAfc 

7261 CTGCTGGAGGGCGACCGGGTGCCGGGTACGGTCCTGGTGGGTGGCGAGGCGGTGCCCGCG 7320 
LLEGDRVPGTVI.VGGEAVFA 

7321 GCGCTGT<X3CGGACCCTGCGCGAACGGAC13GGAGCCACCCGOTOTCAACCTGTACGGG 7380 
ALWRTLRERTGATRFFNLYG 

7381 CCTACGGAGGCGACGGTCGACGCCACCTGCCACGACCTGTCCGACCCCGCCGACGTCCCC 7440 
- - -ATVDATCHDLSDPAUV r 



P T 



7441 GTCATCGGCACCCCACTCCCCCACACCCACGTCCGCGTGCTCGACGACCGACTCCGACCC 7500 
VIGTPLPHTHVRVLDDRLKK 

7501 GTACCCGTGGGCGTCGCCGGCGAAATCTACCTCGGCGGAACCGGCCTGGCCCGCGGCTAC 7 560 
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VPVGVAGEIVLGGTGLARGY 

7561 CIOACOGC^CCCIt^ 7 "° 

7621 «*K»«»»^^ 7680 



7681 TWCIOGttix^^ 774 

YLGRTDDQIK IK,jr 



0 



,41 ATCGAAGCCGTCCrCACCC^CCACCCCGCCGTCAAGGAAGCCGCCGTCACCG^ 7800 
IEAVLTHHPAVKEAAVTV*! 



,801 GACGACGGTGCCGCCCGGCTGGTCGCCCT "60 



7661 GATTCGGCCGACGGCGCCCCGGACGCCCAGGTCGAGGAGTCGAACGCCGTCTTCGAGGCG 7920 
DSADGAPDAQVEEWNAvr 

7921 ACCCACACCGACGCCGCCGACGGCGAACTCACCTTCAACATCAAGGGCTCGAACGACAGC 7980 
THTDAADGELTFNIKOWNua 

7931 CTCACCGGTOCGCCGA^ 8040 

8041 CGGCTCCTGGAACGGCCGGCCGAGCGCGTCCTGGAGATCOTCAGTGGCACCGGGCTGCTG 8100 
RLLERPAERVLEIGSfai^.^" 

B101 ATGTGGCGGCTGCTGCCGCACGTCACCGAGTAI^CCGGAACCGACCTCTCG^GCCCGCC 8160 
MWRLLPHVTEYTGTDFSRPA 



8161 GTGGACTGGCTCCGGGACGGGCTGCGCCGCa3CCCCGCGCACCGGGTACGGCTGCTCCAC 8220 
VDWLRDGLRRRP* HRVRljljn 



8221 CGCGAGGCGACCGACTTCACCXKjCGTCCGCGCCGCGTCCACCGACCTCGTCGTCGTCAAC 8280 
REATDFTGVRAASTDLVVVW 

8281 TCGGTCGTC^GTACTTCCCCGACCGCGCCTACCTCGACACCGTCCTC^CCrcCGCCCTC 8340 
SVVQYFPDRAYLDTVLARAL 

8341 <»CaCttCGOC^^ "00 

8401 CCGCAGTTCTACGCCCGTCAGGCCCTCGCCCACGCCGGTCCGGGCGCGGCGGCGCGGGAC 8460 
PQFYARQALAHAGPGAAARD 

8461 GTGGCGCGCGCCGCCGGCGAGTTCGCGGC^TGGACG^C^GTTGCTGGTGTCCCCCGCG 8520 
VARAAGEFAAMDGELLVSPA 

8521 TACTICGCXWX^^ 8580 

8581 CGGGGACGGCACCGCAACGAGATGAGCCTCT 8 "° 

8641 GGTGACCGCCCGGCGGCCCCGGAGGCGGAGGTGCTWCCTCGGGCGACCAGGTGCACGAC 8700 
GDRPAAPEAEVLTWOUUvn 

8701 CTCGCGTCGCTGTCCGCCCGCCTCGGCCGCGGCWGCCCGGACGCCCTGCTCGTGCGCGGC 8760 
LASLSARLGRGGPDALLVRG 
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876l 8920 

8980 

ssaa agc^ccc^ 8940 

.941 «CGACG»ei*TC^ 9000 

HDDGPLLVPHPvfCt*' 

S0 01 ^CACCCCCAC^ 9060 

9061 TCCTCWCTCGCCGAGCGGC^TCCWCGCACCTC^^ 9120 

912! GCGCTGCCCCGCACCGGCACCGGCAAGCTrcACCGGGGCGCGCTCGGCGGACTCGTGACC 9180 
AL P RTGTGKLDRGALGGLVT 

91B1 GCGGGCCGraCGCCCGK^ "40 

9241 CGGACXOTOBCOQMGOB^ 9300 

9301 AACTTCTTCGCCCTCGGCGGCGACTCCCTCCTCGCCGTCAGGGCTGTCGCCCGGTGCCGC 9360 
NFFALGGDSLLAVRAVARCK 

9361 CGTGCCGGGGTCCGACTGACCGTCCGGCAGTTGCTGAGCGAGCAGACCCrCGCCGCGCTC 9420 
RAGVRLTVRQLLSEQTVA** 

9421 CCGGCGGCCCTCGAGGAGGAGTCTCAATGATGAAGTCAAGCCGCTTGCGCGACCGGCAGC 9480 
A A A L E E E S Q m - m k s s r l R D R Q L 

(orf33) 

94B1 TCGGGGGTGAAGACCCGGTTGTCGCGCAGGAGAGCCCA^GGACGCTG^CCCGACGCCGT 9540 
GGEDPVVAQESPQDAGPTPL. 

9541 GCCAGGGCGATCACCMCTTCAACGTGTTTGCAGCCCTCGCCGCGCTTCTTGAGGTAGAAG 9600 
QGDDGLNVFAALAALLEVtv 

9601 TCCCGGTTCGGCCCCTCCCGCATCATGCTGGTTTGGGCC^ACATGTAGAACACTCXJTCGC 9660 
pVRPLPHHAGLGRHVEHS.su 

9661 AGGCGGCGGCTGTAGCGCTTGGGCCGATGCA^TTGCCAGTGCGACGAC 9720 
AAAVALGPMQVASATTv*vau 

9721 OGfflUMOGC^^ " 80 

9781 TCGCCGGCGGCGACGACGAACTCTCCGCCGAGGATCGGCCCCA 9840 

9841 ATGATCTCGGCCTGTGGATGGCTGCGGAACGTCTCGCGgAtCTGCTGGTCAATCCGCTTC 9900 
DLGLWMAAERLADLLVNFUU 

9901 ^CGGTCG^ 



PCT/USOO/00445 

10020 



WO 00740704 

SSSl tcctccccc^^ 

LPGQR g1jLSIj 

10 .„ Gcrr ^ 10080 

X.0.1 CGG^GACC^ 10140 

10141 c^cg^^^ 10200 

102 oi n«™^^ 10260 

1032 1 TCGGC^TGAC^^ 10380 

10381 10440 

l0 „x «*««^^ 10500 

105 oi agg,cgagga T ^^ 10560 

10S .X GACCACAGCGTCACACCGC^ 10 " 0 

PQRHTGLVDHRKfv 

„«1 TCG^TCCCGG^ 10680 

X0741 CATCTCAATCA^^ 10800 
X0801 GACCGCAGAGAACCATAAGCCACACCCGGCCCTCCCGGGCCGCCTAACAACTTACGGAGA 10B60 

l0 .« accatgac^g^^ 10920 

(orf34) 

l0M1 p^^ a ^^^^^^^^l^a^^^^^p^^v^d*^^^^^^^^^^^^^^" 10980 

1098 x «—^^ 11040 

xxo«i cggctctacgacagcccgcg^^ 11100 

xxx.i a T ccgaccga«:cccgc<^^ 11160 

IGPT P A A K PArur 

11161 Cn^GCT^TGCTCC^ 11220 

44 
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11221 "-"C-o^ 11280 

11281 TCCAGCMO^CC^ 1»« 
„,„ ^*«H«OC«^ 11400 

„4oi cccg^cctgtc™ 11460 

11521 amTCGCGGCGATCGC^^ 1«80 

11581 GACCCCC^GGACCCGTACC^ATCGACC^^ 11640 
DPEDPVLRIDPAYHSfAf^" 

11641 GCGGCCGCCCXKCGGGCGTACGACACCGTCACCGCGCTCATCGAGGACGAGCTGCGGCAC 11700 
AAARRAYDTVTALIEDELRM 

11701 GTCGTCCTGGACGCCGGTT^CTGCTGCTGGTCGACAACTACCAGGCGGTGCACGGCCGC 11760 

VVLDAGSLLLVDNYQAvn^n. 

11761 AAGCCGTTCGCCGCCGCCTACGACGGCCXSrcACCGCTGGCTCAAACGCGTCAACATCACC 11820 
KPFAAAYDGHDRMLKRVNIl 

11821 CGCGACCTGCGCCGTTCCCGGTCCGCGCGGCGGTCGGCCACCTCGCTGC^GTGAGGG 11880 
RDLRRSRSARRSATSLLV 

11881 AGGCACCATGGAT^CCCCCTCACCCG^^^ 1»«0 
(orf35) 

11941 CCGCCCCCC*GTGCGGCTGTGCGCGCTGCCG^^ 12000 

RPRVRLCALPYAGGTAAvt a 

12001 GGACTGGCCCGCCGCGCTGCCCCCCG^G^ 12060 
DWPAALPPG VEI ' 1 ' TAHLr 

12061 CGGCGACCGGTTCACOMCCGCCCCCGGC^ «120 
GDRFTEPPPATLEETAERLL 

12121 CGAGGCGCTGCCGCCGAGTGACCTGCCCACGGTCGTCCTCGGCCACAGC^TGGGCGCCCT 12180 

12181 GCTGGCGTATOAAGTGGCGGCGCGGCTCGCGGCGCGGGGCCGCGCCCCC^CCTGCTGAT 12240 
tGYEVAARLAARGRAPNLLI 

12241 CGCCGCGGCCTGCCSTCCCCCGC*CGTTCCGC^^^ 12300 
AAACRPPHVPPDASGPVTEA 

12301 CGAGC^CCGCCACCCTC^ 12360 

12361 ACT^TGG^GOTGTGCTCCCCGCCCTCGTCGCCGA "420 

12421 CCGCCCGCGGCCCCGCCCGCTCGACCTCCCGCTGAgGTCTACATCGGCGCCGACGACGA 12480 
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RPRPRPLDLPLKVYIGADDD 

ccgc^c^ 12540 

„,« cg^cctc^cccc^^ 12600 

12601 CGTCGCCACGGACCTCGCCGAAGCCGAGGTAGGGGCATGACCGCGCGCGTCGACGCCACA 12660 
v A T D L A E A E V G A m * t a r v d ft T 

(orf36) 

12661 CCCACC™™^ »™ 

12721 CIWJCC^TOT^^ 12780 

xa,8i a*«*«cgttc^^ 12840 

12B41 TTCGCGCCCGCC^CGCCC^^ 12900 



F A 



12901 CTGACCGTCCCCTAC^CIGGGGCTCGC^TGCTGATCA^ 12«0 

12961 CCCACCGC^CGCTGCTCGTCGCC^^^ 13020 
PTAALLVAAAVAGVFAi* r 1j « 

»•» CCCACCAXGCGCCTOCXC^C^^^^ 

130.1 GCCTACGCCCTCGACTCCGTCACCGAG^^ " 14 ° 
AYALDSVTEEVvr 

13141 GGCGGCCTGATCGCGGTCGOTGCACCGCT^CGTCGATGATCACGGTCATGGTGCTGATC 13200 
GGLIAVAAPLASMITVnvu 

13201 GCGGCCGGTACaSCCTGC^CGTGCTGTCCGCCGCGACCGCCGCCGCCCC^ "260 
AAGTACFVLSAATAAAPASw 

13261 GAAGCCGACGA^ACCGGCCGCACGGCCGGCCCATGGCTCTGCCCG^ATGCGCACGATC 13320 
EADEDRPHGRPMALPGMRTI 

13321 GTGCTGTCC^CGGCGGCgWcTCKTCGTCGGGGTGCTCCAGG^ 13380 

VLS FGGVGLiVVGVLQVVJjrr 

13381 ATCGCCGACCACGCGGGCTCGCCCGGCGCGGGCGGCATCCTG^TCCATGCTGTCGGCG 13440 
IADHAGSPGAGGILLSMLSA 

13441 GGCAGCGCGGTCGG CGG CCTCGCCTACGGG CGG ATCG C ^^GCG CTCG AOTC^CffTGCGG 135 °° 
GSAVGGLAYGRIAWRSTPVR 

13501 CGGTTCGTGGTGCTCGTCACCGGGTTCACGCTGGC^ 13560 
RFVVLVTGFTLAVLPLCLia 

13561 AGCCCGGTGCCGGCCGGGGCCTTCGCCCTCCTCGTGGGACTCT^ 12620 



S 



PVPAGAFALLVGLCLAPLF 



13621 ACCACCGCCTACCTGCTGGTCAACGACCTG^ "680 

51 
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13681 GCCAACACCTCGGTCTCCACGGCCAATAACGGAGGGTTCGCCGCGGGCAGCGCCGCCGCC 13740 
ANTWVSTANNGGFAAGSAAA 

13741 GGTGTGCTGCTCGACTCCCGGG^CCCCACC^ 13800 

13801 GCCGCGACCGCCGTCATC^CCGTTCTGCGCCGCCGGACC^CTCCTCGGaCC^ACAC 13860 
AATAVMTVLRRRTLLLGAGH 

13861 CCCGAACCGGCCGCCGCCACACCCGCCGA^^ 13920 
PEPAAATPADRTAPAEAEE 

13921 ACCGATCGTGTCCAAGAACGCGGCGCACTGGTCGCGCATCCGCACAGGGGACGCCCCCGG 13980 
MSKNAAHWSRIRTG DAKU 
(or£37) 

13981 CGTCGTACTCGCCGTGGACrTCTACKAATOGGCCGCCAGGAAGCCACCyCCGCCACCT 14040 
VVLAVDFYGTGRQEATFRHli 

14041 gtgtoacctcctcakgatc^ 14100 

14101 CGACTGGTCCACGGCCACCGGCGCC<3GTCACCTGCGCTGGTGGACCGAGGGGCTCGACAC 14160 
DWSTATGAGHLRWWTEG LDT 

14161 GGTCCTCGCGGGACGGCCGGTGCGGGCCCTCGTCGGCTACTGCGCGGGCGGCGTCTTCGC 14220 
VLAGRPVRALVGYCAGGVFA 

14221 CTCGGCCCTCGCCGACGCCCTCSTCG^C^GAGGGCCA^ W° 

14281 CAACCCCAGCGCGCCCGGCGTCGCCACGCTCACCCGCGACTTCCGCGGTCTGATCGCCGG 14340 
NPSAPGVATLTRDFRGLIAG 

14341 CATGGACCTCCTCACGGACGGGGAACGCGCCGCTCTGCTGGCCGAGACGACCGCGATCCG 14400 
MDLLTDGERAALLAETTAIR 

14401 GCG<3GCACAmCCCCCGACGCGCTGCTAC^GTCGCCGAACGCTACGCCGCCCTOTACCG 14460 
RAHAPDALVPVAERYAALYR 



0 



14461 CGAGGGCTGC<3ACCTCCTGTGCGAGCGGCTCGGCGTGGA03CCTCCpCGGCGCCGAACT 1452 
EGCDLLCERLGVDASFGAEIj 

14521 GGCCGCCGTCCTCCACTC^ 14S80 
AAVLHSYLAYLTAALDVPPT 

14581 CCCGCTGTGGCGCGGCGCCGTCTCGCTCACCTCCCGCGAGCACCAGGGC^CCGACTTCAC 14640 
PLWRGAVSLTSREHQCTDFT 

14641 CGACGTCGAGCACGGCTTCGACGTCGCCCGTGCCGAACTGCTGAGCTCCCCCCAGGTCGT 14700 
DVEHGFDVARAELLSSPQVV 

14701 CGCGGCGCTGACCGCGCTCCTCCGCGAACACGAGGCGAGCCGATGACCCTCACCCTGCGG 14760 
A A L T A L L R E H E A S R m - t l t ^ r 

(orf38) 

14761 GACGCCTTCCTCGACCAGGCCGCCCGGACCCCCGACGCCCACGCCGTCGTA^ 14820 
DAFLDQAARTPDAHAVVHGD 

14821 ACTGTATGGACGTACCGCGAACTGGAACTGC 14880 
TVWTYRELELRAGRMARTLA 

52. 
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nasi o^c^cj^ 

M ,„ fc^aitoc^^ 15000 

X500X CACCCGCCGGA^^^ 15060 

iso.1 gagcacccctc^^ 15120 

15121 CCG^CGC*^ 15180 

151BX tccagtgcc^^^ »™ 

«2«1 CAGGTCCGCGCGCGCTA^ 15300 

15301 ^^^^^^T^TTTT^fTT 0 15360 

15361 CICBH**^ »«"» 

„«„ Tccrxcrcj^^ 15480 

15481 CTGCCCACCGGGCAACTCGTCATCGGCGGTOAGGCGCTGACCGGCTCCGOTCTCGGACCC 15540 

LPTGQLVIGGEAL. i 

15541 TGGCGCGCCGCGCACCCCGACG^^^ 

VJ R A A H P 

15601 GTCGGCTGC^CGO^ »«» 

15661 ATC^CCG^CGCG^ "™ 
l57 „ G^CG^^ 

15781 Ot»«CCC^ 15840 

15841 ----OCACC^ 15900 

15901 OTGCGCGCGGACGGGCAGGTCA^GGTCTC "960 

15,61 GCCGTGCTCOTCGGCCACGCGGGGGTGAGGGACTCCGCGGTCGTCGCCGTOT 

AVLRGHAGVRU^-« V * ^ 

16021 GACG^CcW^ " 0B ° 
16081 GCGCCGGCGCGGCACGCGGCCGAGGCGCTCCCGCraWCATGGTGCCGGCGACGTTCGTC 16140 
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APARHAAEALPPVMVPATFV 
16141 ACCC TC CCC^^ 162 °° 

16201 CCCCCTGCCGGCGACGCCGGGCCGGGCCA^^ 16»60 

PPAGDAGPGDK i 

16261 CTGCTGGCACGGGCCCTGGGCATCCCGGAGATCGACGCCGACGCCGAC^CCTGACGTC 16320 
LLARALGI PElDADAur 



80 



0 



16321 ^CGGCACCAGCATC^ 163 

16381 CTCG;*CTCAC<*CCGTCC^CGC^ 

LELTTVLRERTVRRI^^^vr 

16441 GACGCCGCCTCGCCCCTCGCCGAAGGAGTGCCCGAGTGACCGGTTCCGTAACGCTCACCC 16500 
DAASPLAECVPE' <orf39> 

16501 CCCTCGGCGGGATC^TCCCCAGGCCCCGCGGCGAGGGGCTCACCACCGGCGCCGAGTACG 16560 
LGGI I PRPRGEGLTTGAtxu 

16561 ACCTGGGGCCGCTCGGCGAC^^ "620 

16621 GCGAGCGCCTCGCCACCGACGGGCTGATCCTGCTGCACGGTCTGCCCACCGACGGAGA 16680 

LATDGLlLLHGLik'ii-'«*' v ' 



E R 



1668! GCGTCCACGGC^CCACGAC^ "™ 

1674! AGCGCTCCACCCCGCGCAGCGTGGTCAAGOTCAACATCTACACCT "800 

16801 CCGACCAGCCCATCCCGATGCACAACGAGMC 16860 

16861 TCTACTTCricTGCCACACCGCGCCOiACACCGGcW 16920 
YFFCHTAPDTGGATPIADGR 

16921 GCGCCGTCCTCGACCTCATCCCGGCCGAGGTCAGGCGGCGG^CTCCC^GGGGTCGTCT 16980 

16981 ACACCCGTACGTTCCGCGCCGACATGGGACTGAGCTGGCAGGAAGCG^CCAGACCGAGG 17040 

TRT FRADMGLSWQbAr w *■ » 

1,041 ACCGOWCGACGTC^CGCCAT^^ 17100 
RGDVERHCRAHGQEFSWD^u 

17101 ACGTCC^CGCACCCGCCACCACCGCCCGGCGACCGCCGTCGACCCC^ 17160 
VLRTRHHRPATAVDPGT^ac 

17161 AGGTGTGGTTCAACCAGGCGCACCTGTTCCACCCGTCCAGCCTGGATCCCGAC^^CC 17220 
VWFNQAHLFHPSSLDPDl^KU 

17221 AGGTGCTCCTtWAGACGTACG^CGAGAACGGCCTCCC »a.O 

17281 GCACCCCGATXCCCGACGCC^ 17340 
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17 



17 



17 <01 ™»0«^^ 17460 

461 GAGCCGTGCCGACGCATCGGCACGCCGTCCTCCCGTCGGGGCGCTACCATCGCCGCTGTC 17520 

521 TCGGCCATCACCCCACCCGGGC^AC^CAACCGGCCGTGCACATCCCCGCCGTCGTCGCC 17580 

17581 ACGGCACGCGCGATCACCCGCGCCATGACrcCCCAGCCCGTTGTCACATCTGCGGAGGCG 17640 

1764 1 CCCC^TG^^ 17700 

1770 i c^cgIc^ 17760 

177S 1 CGGAG^GAGGCG^ 17820 

17821 AG^GCAGA^^^ 17880 

17881 CTCCCCGGTGCTGCGGGCCGGAGCGGTAG^GTCCCGCTCGATTCCGGGGCCACG^ 17940 



17941 CKAGCTCG^AC^ 18 °°° 

x.ooi acTOccGcaioxc^ 18060 



C R 



180 6i cccg—cc^^ 18120 

19121 g^cgcgccac^^^ 18180 

18 X81 GACC^CGGGCCG<^GCAAGGGCGTGGTCTCCGGCCAGCGCGCCG "240 

liM1 18300 

XB301 CA^CCACGC^^^ 18 "° 

1BJ61 a^ACc^c^ 18420 

XS421 CGTO^CCGG^ »«° 
18481 CCGGCCACCGCC^^ 18540 



XB54! TGCGGGCGGACGTTGAGGAGCTCCTGGGCGTCCCGCTGCTCGACGG "6 
R A D V E 



00 



WO 00/40704 
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1B601 AG AC CTG CGGCAAG ATCACGGTTG AG CGG CTCGG CGG CTC CCGGGAGGG CGGTTG C CGGG 
TCGKITVERLGGSREGGCR 

SEQ ID NO: 3 BLM gene PPTase ORFS 41 



213VVTAIAVAAPAGTAEESAEG 
961 GGACGACCGGACCGCCGTCCCGTAAAC^ 



240 D D R T A V P 



60 
160 



1 GGATCCTGCGCTACCCGGACTTCGCCCACrrG^ 

81 GCCGCGGTCTACGGGCATCTCCACATCCCCCGCGTC 

161 CCCGCGCGAGTGGCGGCCCCGGC^ 240 

241 TCTGGTGATCGCC^ \\» 

321 TCCTCTTCC^CGA^ 
27 LFPEEAAHVARAVPKRLHEFAIVKvc 

401 GCCCGCGCCGCCCTCGGCCGGCTGGG^^ 480 

53ARAALGRLGLPPGPLLPGRRGAPSWPD /3 

481 CGGGGTGGTGGGGAGCATGACGCACTGTCAGGGCTTCCGGGGCGCCGCGGTCGCCCGGG 560 

80 GVVGSMTHCQGFRGAAVARAADAASLG 106 

561 GGATAGACGCCGAGCCGAACGGGCCGCTCCCGGACGGCGTCCTCGCCATC 640 

107 IDA EPNGPLPDGVLAMVSLPSEREWL \lt 

641 GCCGGACTGGCGGCCCGCCGGCCGGACGTGCACTGGGACCGGCTGCTGTT " 0 

133AGLAARRPDVHWDRLLFSAKESVFKAW 159 

721 GTACCCGCTGACCGGCCTGGAGCTGGACTTCGACGAGGCCGAGCTGGCCG^ 800 

160 ypLTGLELDFDEAELAVDPDAGTFTAR 1B6 

801 (ttCTCCTGGTCCCGGG^ 

m-j T. LVPGPVVGGRRLDGFEGRWAAGEGL 



880 
212 



1040 
247 



1041 GGCGCCGGCCCGGCGGGCCCTCCGCCGTGCGGAGCGGAGGCCCGGCGCGGACGCGC U™ 

1121 AGTCGGCGACGCAGACGTTGCCGTTGGTCGAGTTGAGC^GCCCGACGATGTCGATGGTGTTGCCGCAGAGGTTGATGGGG 1200 

1201 ATGTGGACGGGGATCTGGATGACGTTGCCCGAGACGACGCCC^GGAGCCGACGGCCGCCCCCTTGGCGTTCGA 1280 

1281 GAGGGCGGTGCCGGAGACGCCGGOSAG^ "60 

1361 GGTGACACCTTCGTTCGGTCTGACAGGGTCGAGCTCACGGCCT 1440 

1441 CGAAGGTTTCGAATCGTGCGGCGGACGGG "20 

1521 CGTGCGCCATCTGTACAGCCCGGTCCCG^ "00 

1601 GGGGAGGCCATGAGCCGGATCGCGATCGTCGGGGCGGGTC^ 

1681 GAGCGGCTCTTCCCGTCACGAGGTGCTGCTCGTGTCCGAC^ 



1680 
1760 



1761 C 1761 
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Box I Observation* 



where certain claims were found unsearchable (Continuation ofltem 1 of first sheet) 



This international report has not been established in respect of certain claim, under Article 17(2X0 for the following reasons: 

1 m Claims Nos.: , . 

1 because they relate to subject matter not required to be searched by this Authority, namely: 



□ ZlZ^Zy relate to parts of the international application that do not comply with the prescribed requirements to such 
an extent that no meaningful international search can be carried out. specifically: 



' becT« A°cy are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 



Box II Observations where unity of invention Is lacking (Continuation of Item 1 of first sheet) 



This International Searching Authority found multiple inventions in this international application, as follows: 
Please See Extra Sheet 



. Q As all required additional search fees were timely paid by the applicant, this international search report covers all searchable 



claims, 



2. Q A, all searchable claims could be searched without effort justifying an additioaal fee. this Authority did not invite payment 

of any additional fee. 

3. n As on 'y * omc of *• re <* uired additionaI search fccs wcrc timely paid by * e applicaflU *** intcmationaI scarch report covera 

— ly those claims for which fccs were paid, specifically claims Nos.: 



on I 



m No required additional search fees were timely paid by the applicant. Consequently, this international search report is 
1 — 1 restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 
1-45, 65-69, and 71-73 to the extent they read on 0RF8 



Remark on Protest Q The additional search fees were accompanied by the applicant's protest. 

| | No protest accompanied the payment of additional search fees. 
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B. FIELDS SEARCHED 

Electronic data bases consulted (Name of data base and where practicable terms used): 
STN 

search terms: bleomycin, gene, operon, orf. open reading frame, cluster, aureus, verticillus, host cell, polyketidc 
synthase, PKS 

BOX II. OBSERVATIONS WHERE UNITY OF INVENTION WAS LACKING 
This ISA found multiple inventions as follows: 

This application contains the following inventions or groups of inventions which are not so linked as to form a single 
inventive concept under PCT Rule 13.1. In order for alt inventions to be searched, the appropriate additional search fees 
must be paid. 

Group I, claim(s)M5. 65-69, and 71-73, drawn to isolated nucleic acidi, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 8. 
Group II, cIaim(s)l-45, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 9. 
Group III, claim(s)l-45, 65-69. and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 10. 
Group IV, claim(s)l-45. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 11. 
Group V, claim(s)M5, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 12. 
Group VI. claim(s)M5, 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 13. 
Group VII. claim(s)M5, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 14. 
Group VIII. claim(s)l-45. 65-69, and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 15. 
Group IX, claim(s)l-45. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 16. 
Group X, claim(s)l-45, 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 17. 
Group XI. claim(s)M5, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 18. 
Group XII, claim(s)M5, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 19. 
Group XIII. claim(s)l-45, 65-69, and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 20. 
Group XIV, claim(s)l-45, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 21. 
Group XV, claim(s)l-45, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 22. 
Group XVI, claim(s)M5. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 23. 
Group XVII, claim(s)l-45, 65-69, and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 24. 
Group XvllI, claim(s)M5, 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 25. 
Group XIX, claim(s)M5, 65-69. and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 26. 
Group XX. claim(s)M5. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 27. 
Group XXI. claim(s)I-45. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 28. 
Group XXII, claim(s)l-45, 65-69. and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 29. 
Group XXIII, daim(s)l-45, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 30. 
Group XXIV, claim(s)M5. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
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complexes, polypeptides, expression vector., and host cells, to the extent that these products read on ORF 31. 
Group XXV claim(s)l-45. 65-69. and 71-73. drawn to isolated nucleic acids, gene cluster*, mulu-functional protein 
complexes. rUrypepudes, expression vectors, and host cells, to the extent that these products read on ORF 32. 
Group XXVI cUim(s)M5. 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 33. 
Group XXVII claim(s)M5. 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 34. 
Group XXVIII cIaim(s)l-45. 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 35. 
Group XXIX claim(s)M5. 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 36. 
Group XXX. claim(s)M5. 65-69, and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes. rUlypeptidcs. expression vectors, and host cells, to the extent that these products read on ORF 37. 
Group XXXI, claim(s)M5. 65-69, and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 38. 
Group XXXII claim(s)l-45. 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 39. 
Group XXXIII, claim(s)l-45. 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes. polyP^"- expression vectors, and host cells, to the extent that these products read on ORF 40. 
Group XXXTV, claim(s)M5. 65-69. and 7 1-73.- drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 41. 

Oroup XXXV, claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 
encoded by ORF 8. 

Group XXXVI. claims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide 

encoded by ORF 9. ...... ^ 

Group XXXVII. claims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide 

encoded by ORF 10. 

Group XXXVIII, claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

encoded by ORF 11. 

Group XXXIX. claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypcpude 

encoded by ORF 12. 

Group XL. claims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

by ORF 13. . ( . , j j 

Group XLI. claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

by ORF 14. 

Group XLII. claims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide 
encoded by ORF 15. 

Group XLIII, claims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide 
encoded by ORF 16. 

Group XLIV, claims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide 
encoded by ORF 17. 

Group XLV, claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 
encoded by ORF 18. 

Group XLVI. claims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide 
encoded by ORF 19. 

Group XLVII. claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 
encoded by ORF 20. 

Group XLVIII. claims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide 
encoded by ORF 21. 

Group XLIX, claims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide 
encoded by ORF 22. 

Group L. claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

by ORF 23, . . 

Group LI. claims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

Grou^LIl! claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

by ORF 25. ' , , 

Group LIU, claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

by ORF 26. 
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Group LIV. claim. 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

O^JlJ! claim. 46-57. drawn to method, of chemically modifying a biological molecule using a polypeptide encoded 

So^l"' claim. 46-57. drawn to method, of chemically modifying a biological molecule using a polypeptide encoded 

Grou^i. claim. 46-57. drawn to method, of chemically modifying a biological molecule using a polypeptide 

oXt^ y |i°Iwm.'46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

JlptlX. ?iafm. 46-57. drawn to method, of chemically modifying a biological molecule using . polypeptide encode* 

Groujl" claim. 46-57. drawn to method, of chemically modifying a biological molecule using a polypeptide encoded 

JuS.' dai»« 46-57. drawn to methods of chemically modifying • biological molecule using a polypeptide encoded 

Srou^LWI. claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

GrX'S" 0 ^"' ^-". drawn to me,hod ' of etamicdly modifying a biological molecule using a polypeptide 

Stxi^eUims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

G n rout e L^ 0 cWm*s 7 46-57. drawn to methods of chemically modifying a biological molecule usmg a polypeptide 

oXtxV^cUims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

Giout'LXW.'claHns 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

G™ttxv2dalms 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 
encoded by ORF 41. 

Group LX1X. claims 58-61. drawn to methods of coupling a first amino acid to a second amino acid using a polypeptide 
G n ro 0 u d p tS 0 cWoIs 6 58-61. drawn to methods of coupling a first amino acid to a second amino acid using a polypeptide 
GTuttxXLcUim, 58-61. drawn to methods of coupling a first amino acid to a second amino acid using a polypeptide 

Grotpl^L^aims 58-61. drawn to methods of coupling a first amino acid to a second amino acid using a 

polypeptide encoded by ORF 22. . . ...:_„ . 

Group LXX1II. claims 58-61. drawn to methods of coupling a first ammo acid to a second amino ac.d us.ng a 

polypeptide encoded by ORF 23. 

Group LXXTV. claims 58-61. drawn to methods of coupling a first ammo acid to a second ammo acid us.ng . 

DoIvDcotidc encoded by ORF 25. . 

Group LXXV. claim. 58-61. drawn to methods of coupling a first amino acid to a second ammo acid usmg a 

polypeptide encoded by ORF 26. , 

Oroup LXXVI. claims 58-61. drawn to methods of coupling a first amino acid to a second ammo acid using a 

doIv peptide encoded by ORF 32. 

Group LXXVII. claims 58-61. drawn to methods of coupling a first amino acid to a second ammo acid using a 
polypeptide encoded by ORF 38. 

Group LXXVIII. claim. 62-63. drawn to method, of coupling a first fatty acid to a second fatty acid using a polypeptide 

oTuttxXIX^im. 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

GrouttxXX. R da.'o. 62-63. drawn to method, of coupling a first fatty acid to a second fatty acid using a polypeptide 

oTuplxXxKlims 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

Gro^XXxTclaim, 62-63. drawn to method, of coupling a first fatty acid to a second fatty acid using a polypeptide 
encoded by ORF 12. 
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Group LXXXIU. *b» *«- • *~I«M . ft* - - • — * *«* """» * 

Z£1SSSI. cL, 62-63. drawn to methods of coup liog a Or,. fat* acid to a second fatty acid using a po.yp.ptid. 
wS£Z£u drawn to methods of coupling a fir* fa«y -id » a fatty acid using a po.yp.pud. 

SSSSSK ILs 62<S3. drawn to m.thod, of coupHng a fir,, fa«y acid to a second fa«y acid using a po.yp.ptid. 
^SSSSL* 62^3. drawn to method, of coupling a fir., fatty acid to a «co»d fatty acid using a 

i£!S^JS^«~* - — * « f • Ct * fctty - H to 1 f8tty 8Cid usiB8 8 
SSSSS^S!^ - «** * Cr " acid 10 ' second feny acid usin8 " polypep,ide 

Ef£S£Ji*>. drawn to methods of coup.ing a fir,, fatty acid to a second fatty acid using . po.ypep.id. 

SSJSSSS^m*. *«• » of eoup,in8 * cm fctty 4Cid to s ,econd fcny scid u,in8 a po,ypepU<,e 

ru d p^n 0 cL 2 . l 62-63. drawn to method, of coup.ing a first fatty acid to a s.cond f.«y acid using a po.yp.ptid. 
0ro^n. O cfai m 2 ,62.63. drawn to methods of coup.ing a first fatty acid to a second fatty acid using a po.yp.ptid. 
GTutlcKcfaim 3 , 62^3. drawn to method, of coup.ing a fir,, fatty acid to a second fatty acid using a po.yp.ptid. 
O^X&JS." 62-63. drawn to method, of coup.ing a first fatty acid to a second fatty acid using a po.yp.pUd. 

G^VlTa", 62-63. drawn to method, of coup.ing a firs, fatty acid to a second fatty acid using a po.ypeptid. 

w££i2L 62-63. drawn to m.thod, of coup.ing a first fatty acid to a second fatty acid using a po.ypeptid. 

S£!Sw2!« 62-63. drawn to method, of coupHng a first fatty acid to a second fatty acid using . po.ypeptid. 

WOxSLS; 62-63. drawn to m.thod, of coup.ing a fir,, fatty acid to a ..cond fatty acid using a po.yp.pud. 

CX^. * — - of coupUn8 * fint fitty 8cid to 8 second f8tty BCid U!in8 8 ***** 

G^IKm, 62-63. drawn to method, of coup.ing a first fatty acid to a second fatry acid using a po.yp.ptid. 
oCc.l c^. 3 62-63. drawn to m.thod, of coup.ing a ft, fa«y acid to a second fatty acid using a po.ypeptid. 
STcSSSL! 62-63. drawn to method, of coup.ing a first fatty acid to a second fatty acid using a po.ypeptid. 
SjitV. c.Is 3 62.63. drawn to m.thod, of coup.ing a fir,, fatty acid to a second fa«y acid using a po.ypeptide 

SS^SS. 3 ^ * °<™> ]ia * 8 Crst f6tty M to 8 second f8tty 8Cid usin8 " polypepade 

S^SISm*. drawn «o methods of coup.ing a first fatty acid to a second fa«y acid using a po.ypeptid. 
gX1Si 0 cL 3 . 6 62^3. drawn to m.thod, of coup.ing a first fat* acid to a second fa«y acid using a po.ypeptide 
aCci"Si". 62-63. drawn to method, of coup.ing a fir,. fa«y acid to a second fatty acid using a po.ypeptide 
Gro^cX .dl, 62.63. drawn to method, of coup.ing a first fatty acid to a ,econd fatty acid using a po.ypeptid. 

^cJSL 3 ^. — - —* «* * Crs ' icid to 8 second f8tty 8Cid usin8 8 po,ypep,ide 

oX'cS 213*63. drawn to methods of coup.ing a firs, fa«y acid to a second fatty acid using a po.ypeptid. 
encoded by ORF 41. 

rmun en, clain , 64. drawn to method, of producing a bleomycin or bleomycin analog using ORF 8. 
oZl cS!i. «"£ M. drawn U> method, of producing a bleomycin or bl.omycn ana.og us.ng ORF 9. 
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Group CXJV, claim 64. drawn to method* of producing a bleomycin or bleomycin analog using ORF 10. 
Group CXV, claim 64, drawn to methods of producing a bleomycin or bleomycin analog using ORF 11. 
Group CXVI, claim 64, drawn to methods of producing a bleomycin or bleomycin analog using ORF 12. 
Group CXV11, claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 13. 
Group CXVM. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 14. 
Group CXIX. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 15. 
Group CXX, claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 16. 
Group CXXI. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 17. 
Group CXXli. claim 64, drawn to methods of producing a bleomycin or bleomycin analog using ORF 18. 
Group CXXIIi. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 19. 
Group CXXIV. claim 64, drawn to methods of producing a bleomycin or bleomycin analog using ORF 20. 
Group CXXV claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 21. 
Group CXXvi. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 22. 
Group CXXVH, claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 23. 
Group CXXVIII. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 24. 
Group CXXIX. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 25. 
Group CXXX, claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 26. 
Group CXXXI. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 27. 
Group CXXXII. claim 64, drawn to methods of producing a bleomycin or bleomycin analog using ORF 28. 
Group CXXXIII. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 29. 
Group eXXXTV. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 30. 
Group CXXXV. claim 64, drawn to methods of producing a bleomycin or bleomycin analog using ORF 31. 
Group CXXXVI, claim 64, drawn to methods of producing a bleomycin or bleomycin analog using ORF 32. 
Group CXXXV11. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 33. 
Group CXXXVIII. claim 64, drawn to methods of producing a bleomycin or bleomycin analog using ORF 34. 
Group CXXXIX, claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 35. 
Group CXL. claim 64, drawn to methods of producing a bleomycin or bleomycin analog using ORF 36. 
Group CXLI, claim 64, drawn to methods of producing a bleomycin or bleomycin analog using ORF 37. 
Group CXLII. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 38. 
Group CXLlll. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 39. 
Group CXLIV. claim 64, drawn to methods of producing a bleomycin or bleomycin analog using ORF 40. 
Group CXLV. claim 64. drawn to methods of producing a bleomycin or bleomycin analog using ORF 41. 

Group CXL VI, claim 70, drawn to methods of converting an apo-carricr protein to a holo-carrier protein using a 
phosphopantetheinyl transferase encoded by ORF 41. 



The inventions listed as Groups I-CXLVI do not relate to a single inventive concept under PCT Rule 13.1 because 
under PCT Rule 13.2, they lack the same or corresponding special technical features. The MPEP states in Annex B 

(page Al-36)that . t . t t . . . 

-Unity of invention exists only when there is a special technical relationship among the claimed inventions 
involving one or more of the same or corresponding special technical features. The expression 'special 
technical features' is defined in Rule 13.2 as meaning those technical features that define a contribution which 
each of the inventions, considered as a whole, makes over the prior art The determination is made on the 
contents of the claims as interpreted in light of the description and drawings (if any)." 

The following is the organization of the Groups: 

Supergroup A (Groups I-XXX1V): each isolated nucleic acid comprising any one of ORFs 8 through 41 (34 separate 

Superpoup B (Groups XXXV-LXVHI): methods of chemically modirvinii a biological molecule using any one of 
ORFs 8 through 41 (34 separate groups); . 
Supergroup C (Groups LX1X-LXXV1I): methods of coupling a first amino acid to a second ammo ac|d using any one 
of ORFs 16 17 21-23. 25. 26, 32. or 38 (ORFs disclosed as encoding NRPS$X9 separate groups); 
Supergroup D (Groups LXXVW-CX1): methods of coupling a first fatty arid to a second fatty acjd using any one of 
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ORFs 8 through 41 (34 separate groups* . 

Supergroup E (Groups CX1I-CXLV): -*bnJ. nf nroducin, a bleomycin orblcomycm an^loo , <««ng «>»y °»« °* 
8 through 41 (34 separate groups); and ...... 

Supergroup F (Group CXLVI): nr " H« eonvertin,, an aoo-carrier pmtrm 19 r, hftlMMWr <"">8 ORF 41 

(SEQ ID NO:3) (1 group). 

1. The Groups within Supergroup A (Groups l-XXXTV) lack unity of invention for the following reasons: 

The technical feature in Claim 1 is denoted by applicants' claim language, namely, "any <&£. of Bbn open 
reading frame. (ORF.) 8 through 4!" (emphasis added), indicting that each, individual ORF is an invention su.ee any 
ONE open reading frame satisfies the claim. However, this technical feature is not a "spec.al techn.cal feature w,thm 
the meaning of PCT Rule 13 2 because it fails to distinguish over the prior art for the reasons set forth below. 

At least 2 invention, in claim I. and possibly more, do not contribute over the pnor art upon a cursory, prior 
art search for the purpose, of defining the unity of invention, Sugiyam. .« al. (0~ 111 (1994) 1 1-16) and Calcutt e, 
.1 (Gene 151 0994) 17-21) teach a 14.4 kb plasmid. pMSA-1 (see Sugiymama etaL.page 13. Fig. 2\ which contains 
the bleomycin resistance region from Strcplonyce, ycrttcMu, (ORF, 1-7* Calcut, et al. further suggest "resistance 
and production function, may be clustered" (see page 21). Applicant, identify ORF 4 a, bemg located between ORF 1 
and ORF 3 (see instant specification Fig. 13). and thus Sugiyama et al. inherently teaches the equ.va cot of ORF 41 m 
their pMSA-1. Additionally, applicant.' own .pecification position, blml (ORF 10) appro»mately 4 kb upstream of 
La a. defined by Sugiyam. et al. and Calcut. et .1. (see isntant specification, page 45. line 21) Clearty in appl.ca... 
Fig 2 blmC (1 5 kb) and ORF 8 (1 58 kb) are downstream (closer to blmA) of blm\ indicating that these ORFs are 
within 1.22 kb of blmA. and at least 3.8 kb upstream of 6foiA is taught in the pMSA-1 plasmid which would encompass 
all of ORF 8 and most of blmC. 

Each technical feature in claim 1, i.e. ORFs 8 through 41. encodes a unique polypeptide with a unique 
function with respect to the other ORFs. as supported by applicants' specification in Tables I and II which notes the 
distinct enzymatic activities of the disclosed ORFs. Thus. ORFs 8 through 41 lack not only the same, but also . 
corresponding special technical feature. 

The 34 invention* in Claim 1 ( as defined by the 34 ORFs, are 34 isolated nucleic acids which, when 
considered as a whole, do m contribute a common special technical feature over the prior art These 34 inventions are 
merely products, namely nucleic acids, which share the basic chemical construction of nucleic acids (deoxynbose sugar 
and phosphate backbone with one of five nucleoside bases attached). While these inventions may share uUlity. when 
coupled alt together, for the production of bleomycin, that utility cannot be considered a special technical feature since it 
is neither expressly claimed nor clearly identified in Claim 1. 

2. The Groups within Supergroup B (Oroups XXXV-LXVIII) lack unity of invention for the following reasons: 

The methods of Supergroup B lack unity because said methods use wholly different reagents (different 
polypeptides encoding by different ORFs) to produce wholly different products (bleomycin analogs). In particular, the 
use of any one ORF in the methods of Supergroup B renders a distinct product because polypeptides encoded by each of 
ORFs 8 through 41. as represented in Groups XXXV-LXVIII, have different and distinct functions (see instant 
specification Table I and II). 

3. The Groups within Supergroup C (Groups LXIX-LXXVII) lack unity of invention for reasons analgous to 
those stated in section 2 above pertaining to Supergroup B. 

4. The Groups within Supergroup D (Groups LXXVIII-CXI) lack unity of invention for reasons analgous to 
those stated in section 2 above pertaining to Supergroup B. 

5. The Groups within Supergroup E (Groups CXII-CXLV) lack unity of invention for reasons analgous to those 
stated in section 2 above pertaining to Supergroup B. 

6. Supergroup F contains only one group. Group CXLVI; however, it is named a Supergroup for consistency. 
No lack of unity is found within Supergroup F. 

7. Each member of Supergroup A lacks unity of invention with each member of Supergroups B-F for the 
following reasons: 
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The technical feature of each ORF in claim 1 is a nucleic acid which can encode a polypeptide used in one of 
Grouos JfS^SS However, the method, can be practiced in the absence of the isolated nucle.c acds and tn tict, 
are pracSi bSrTyc in pacing bacuris. Thus. tb. nucleic acids and the method, of using the encoded protetn, 

lack unity of invention. 

8 Each member of Supergroup. B-F lacks unity of invention with every other member because each Oroup 

Lin Su™pT B -F produce a whoHy different and distinct product These products can , be bleomycn analogs 
r^dTS 2 component fatty acids, and numerous permutations of tri. tetra. etc. of sa.d products. Thus, all 
these method groups lack unity of invention with each other. 
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